开发者

what is the way to represent a unichar in lua

开发者 https://www.devze.com 2023-04-13 01:16 出处:网络
If I need to h开发者_开发技巧ave the following python value, unicode char \'0\': >>> unichr(0)

If I need to h开发者_开发技巧ave the following python value, unicode char '0':

>>> unichr(0)
u'\x00'

How can I define it in Lua?


There isn't one.

Lua has no concept of a Unicode value. Lua has no concept of Unicode at all. All Lua strings are 8-bit sequences of "characters", and all Lua string functions will treat them as such. Lua does not treat strings as having any Unicode encoding; they're just a sequence of bytes.

You can insert an arbitrary number into a string. For example:

"\065\066"

Is equivalent to:

"AB"

The \ notation is followed by 3 digits (or one of the escape characters), which must be less than or equal to 255. Lua is perfectly capable of handling strings with embedded \000 characters.

But you cannot directly insert Unicode codepoints into Lua strings. You can decompose the codepoint into UTF-8 and use the above mechanism to insert the codepoint into a string. For example:

"x\226\131\151"

This is the x character followed by the Unicode combining above arrow character.

But since no Lua functions actually understand UTF-8, you will have to expose some function that expects a UTF-8 string in order for it to be useful in any way.


How about

function unichr(ord)
    if ord == nil then return nil end
    if ord < 32 then return string.format('\\x%02x', ord) end
    if ord < 126 then return string.char(ord) end
    if ord < 65539 then return string.format("\\u%04x", ord) end
    if ord < 1114111 then return string.format("\\u%08x", ord) end
end


While native Lua does not directly support or handle Unicode, its strings are really buffers of arbitrary bytes that by convention hold ASCII characters. Since strings may contain any byte values, it is relatively straightforward to build support for Unicode on top of native strings. Should byte buffers prove to be insufficiently robust for the purpose, one can also use a userdata object to hold anything, and with the addition of a suitable metatable, endow it with methods for creation, translation to a desired encoding, concatenation, iteration, and anything else that is needed.

There is a page at the Lua User's Wiki that discusses various ways to handle Unicode in Lua programs.


For a more modern answer, Lua 5.3 now has the utf8.char:

Receives zero or more integers, converts each one to its corresponding UTF-8 byte sequence and returns a string with the concatenation of all these sequences.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号