-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offer a "strict" mode for Unicode #1
Comments
One option that's made available to us by ES2015 is use of the string iterator to split a string on codepoints. I haven't thought this through fully, but I think it should be possible to just replace the two references to node.nodeValue.length with something like [...node.nodeValue].length (that syntax will result in an Array of strings each representing a single codepoint). My reading of https://kangax.github.io/compat-table/es6/ is that this would currently be supported natively by:
(No IE support at all for string symbol iterators...) It also looks like Babel may support mapping this to ES5, although I'm also not sure about that... |
This may also imply the creation of lots of garbage for collection by the runtime in the form of these temporary arrays, but... ¯_(ツ)_/¯ |
The library currently counts characters reported by the length of the
Node.textContent
property which is a JavaScriptString
.We could offer a strict mode wherein the offsets are calculated by using
punycode
andString.prototype.normalize
to count unicode symbols.The text was updated successfully, but these errors were encountered: