Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offer a "strict" mode for Unicode #1

Open
tilgovi opened this issue May 9, 2015 · 2 comments
Open

Offer a "strict" mode for Unicode #1

tilgovi opened this issue May 9, 2015 · 2 comments

Comments

@tilgovi
Copy link
Owner

tilgovi commented May 9, 2015

The library currently counts characters reported by the length of the Node.textContent property which is a JavaScript String.

We could offer a strict mode wherein the offsets are calculated by using punycode and String.prototype.normalize to count unicode symbols.

@nickstenning
Copy link

One option that's made available to us by ES2015 is use of the string iterator to split a string on codepoints. I haven't thought this through fully, but I think it should be possible to just replace the two references to

node.nodeValue.length

with something like

[...node.nodeValue].length

(that syntax will result in an Array of strings each representing a single codepoint).

My reading of https://kangax.github.io/compat-table/es6/ is that this would currently be supported natively by:

  • Chrome 46+
  • Firefox 36+
  • Safari 9+
  • Edge 12+

(No IE support at all for string symbol iterators...)

It also looks like Babel may support mapping this to ES5, although I'm also not sure about that...

@nickstenning
Copy link

nickstenning commented May 19, 2016

This may also imply the creation of lots of garbage for collection by the runtime in the form of these temporary arrays, but... ¯_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants