Aegisub / Automation / Lua / Modules / unicode.lua

The unicode.lua include for Automation 4 Lua contains various helper functions for working with UTF-8 encoded text.

unicode.charwidth

Synopsis: width = unicode.charwidth(instring, index=1)

Returns the number of bytes occupied by the UTF-8 encoded character starting at position index in instring. The character pointed to is assumed to be a prefix byte.

The index parameter is optional abd defaults to 1 (one) when left out, meaning the width of the first character in instring will be returned.

unicode.chars

Synopsis: for char = unicode.chars(instring) do ... end

Returns an iterator function for looping over all characters in the given UTF-8 encoded string. For each iteration of the loop, char will contain a string representing the next character in the string (which may be more than one byte long).

unicode.len

Synopsis: length = unicode.len(instring)

Determine the length of the given UTF-8 encoded string.

Be aware that this function does not run in constant time, but in linear time (O(N)) proportional to the number of Unicode characters in instring.

unicode.codepoint

Synopsis: val = unicode.codepoint(instring)

Calculate the Unicode codepoint for the first character in instring.