[lua] Fall back to built-in utf8 module on Lua 5.3+#12596
[lua] Fall back to built-in utf8 module on Lua 5.3+#12596jdonaldson wants to merge 1 commit intoHaxeFoundation:developmentfrom
Conversation
HaxeFoundation#9412) On Lua 5.3+, add a runtime shim that pre-populates package.loaded['lua-utf8'] with a compat table built from the built-in utf8 module. This lets the Lua target work without the third-party lua-utf8 library on Lua 5.3+. Limitations without lua-utf8: upper/lower are ASCII-only; gsub/gmatch/match operate on bytes, not characters.
|
I think it may be worth reorganising things a bit here, rather than trying to force utf8 into the shape of lua-utf8.
I'm also worried that this introduces a bit of inconsistency. Currently, we either have lua-utf8 or we don't. If we have it, then strings are treated as utf8 and all operations work in terms of codepoints. If we don't have it, then all strings are treated as bytes. This is easy to explain in documentation. With this fallback implementation, some methods use utf8 characters and others use bytes, which can break things as a user might take the output from one method and pass it to the other. In particular, falling back to I'm not exactly sure the best way to solve all these concerns. Maybe another option is: if we use utf8 as a replacement for lua-utf8, throw on unimplemented methods, instead of falling back to unicode incompatible ones. That way users will get an explicit error instead of inconsistent behaviour. What are your thoughts? |
|
Yeah, what I have here is a bad idea for the reasons you're outlining. I've put this repo down as a draft for now. I think it will require a much bigger effort to completely integrate 5.3 utf8, and I'm not even sure if the juice is worth the squeeze here. Any additional thoughts appreciated. |
Summary
_hx_utf8.lua) that pre-populatespackage.loaded['lua-utf8']with either the real library or a compat table built from Lua 5.3+'s built-inutf8module@:luaRequire('lua-utf8')generatesrequire, so it's transparent — no changes toUtf8.hxorString.hxutf8module, original error preserved)Follows the same
pcall(require, ...)pattern as_hx_bit.lua.Per tobil4sk's comment: the built-in utf8 module doesn't implement all the methods provided by lua-utf8, but the ones that are provided we can use.
Compat table methods
len(s,i,j,lax)utf8.len+ lax fallback to#sfor invalid UTF-8char(...)utf8.chardirectlycodes(s)utf8.codesdirectlybyte(s,i)utf8.offset→utf8.codepointsub(s,i,j)utf8.offsetfor char→byte, thenstring.subfind(s,pat,init,plain)utf8.offsetfor init,string.find,utf8.lenfor byte→charupper,lowerstring.upper/string.lower(ASCII only)gsub,gmatch,matchstring.*byte-level fallbackLimitations (without lua-utf8)
upper/lowerare ASCII-onlygsub/gmatch/matchoperate on bytes, not charactersCloses #9412
Test plan