Offer a "strict" mode for Unicode #1

tilgovi · 2015-05-09T22:25:50Z

The library currently counts characters reported by the length of the Node.textContent property which is a JavaScript String.

We could offer a strict mode wherein the offsets are calculated by using punycode and String.prototype.normalize to count unicode symbols.

The text was updated successfully, but these errors were encountered:

nickstenning · 2016-05-19T14:26:40Z

One option that's made available to us by ES2015 is use of the string iterator to split a string on codepoints. I haven't thought this through fully, but I think it should be possible to just replace the two references to

node.nodeValue.length

with something like

[...node.nodeValue].length

(that syntax will result in an Array of strings each representing a single codepoint).

My reading of https://kangax.github.io/compat-table/es6/ is that this would currently be supported natively by:

Chrome 46+
Firefox 36+
Safari 9+
Edge 12+

(No IE support at all for string symbol iterators...)

It also looks like Babel may support mapping this to ES5, although I'm also not sure about that...

nickstenning · 2016-05-19T14:32:54Z

This may also imply the creation of lots of garbage for collection by the runtime in the form of these temporary arrays, but... ¯_(ツ)_/¯

tilgovi mentioned this issue Aug 22, 2016

TextPositionSelector, thoughts about Unicode code *point* vs. UTF16 code *unit* w3c/web-annotation#350

Closed

tilgovi mentioned this issue May 10, 2017

Positions are captured / anchored in terms of code units rather than code points tilgovi/dom-anchor-text-position#6

Open

tilgovi mentioned this issue Oct 8, 2019

Handle start/end positions equal to the root's text content length tilgovi/dom-anchor-text-position#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offer a "strict" mode for Unicode #1

Offer a "strict" mode for Unicode #1

tilgovi commented May 9, 2015

nickstenning commented May 19, 2016

nickstenning commented May 19, 2016 •

edited

Loading

Offer a "strict" mode for Unicode #1

Offer a "strict" mode for Unicode #1

Comments

tilgovi commented May 9, 2015

nickstenning commented May 19, 2016

nickstenning commented May 19, 2016 • edited Loading

nickstenning commented May 19, 2016 •

edited

Loading