-
-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update String#trim #316
base: master
Are you sure you want to change the base?
Update String#trim #316
Conversation
Just like on paulmillr/es6-shim#354, please add a test that would fail without this change. Can you also link to the ES5 spec and show where these characters should be included? |
I addressed the references to the spec in paulmillr/es6-shim#354 (comment), but a key point from that is that I probably got the improper-test trim wrong, as this shim definitely does: The set of non-trimmable characters includes |
Thanks - we'll still need an automated test to prevent future regressions. |
I'll get a test in later today, and then feel the frustration that comes with not being able to rebase from GitHub's Web-based interface. |
👍 |
By the way, analogously to a comment by mathiasbynens in paulmillr/es6-shim#354 (comment), should |
It's totally fine to have es5-shim use one version, and es6-shim use another. |
@lewisje Are you planning on completing this PR? |
I am, but I got distracted 😞 |
be25e20
to
240adc6
Compare
I forgot, with this re-commit, to explicitly trim I remember seeing, for example, that if you trim using a regex from the beginning, then use a loop to find the index to do I think I'll be re-committing my PRs for both this and es6-shim now. |
It's totally fine if |
8178008
to
c0faeaa
Compare
Maybe you can't patch |
c0faeaa
to
f11c7c9
Compare
You can patch it on any engine. If the spec requires it, the tests need to. |
I made that comment because, when I had |
To me, that suggests that the runtime check was broken, and the test was correct :-) |
I don't see where the runtime check was broken: 40e46d1 |
Try being more explicit rather than relying on truthiness or falsiness of an empty string - if indeed it still seems correct, then let's investigate that! |
I noticed that weird stuff happens in the Node REPL if I re-define
It looks like
My guess is that the REPL relies on With that said, I'll make the trim checks more explicit now. |
f11c7c9
to
ff50a01
Compare
Okay, I just tried patching
|
Test for incorrect trimming of NEL, trim ZWSP (which *is* trimmable in the oldest Unicode spec ES5 targets), reduce the number of string concatenations, and more quickly determine `hasTrimWhitespaceBug` to be false in the usual pre-ES5 case. I would change `if (typeof this === 'undefined' || this === null)` to `if (this == null)` but I don't know whether that's allowed by this library's style guide.
fc1b9dc
to
9c87839
Compare
it('does not trim the zero-width space', function () { | ||
expect('\u200b'.trim()).toBe('\u200b'); | ||
expect('\u200b'.trim().length).toBe(1); | ||
it('does not trim the next-line character', function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link me to which part of the spec talks about this character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This character is in the Unicode category Cc (Other, Control). See http://www.fileformat.info/info/unicode/char/0085/index.htm. The spec doesn't say anything about this category or this character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECMAScript implementations must recognize all of the white space characters defined in Unicode 3.0. Later editions of the Unicode Standard may define other white space characters. ECMAScript implementations may recognize white space characters from later editions of the Unicode Standard.
I once read through several editions of the Unicode Standard and found that U+0085
(NEL) has been in Cc ever since 1.1.5 at least, and in particular in 3.0 and all later versions, so although its purpose was a single-character replacement for \r\n
, it's non-trimmable; also, as for actively trimming U+200B
(ZWSP), that was in Zs before Unicode 4.0.1 (upon and after which it was in Cf), so ES5 must recognize it as trimmable (and ES6 must not): paulmillr/es6-shim#354 (comment)
(When I made that comment, I hadn't realized that even if an ES5 or ES6 implementation targets a later version of Unicode than the earliest permissible one, it must still recognize as whitespace all Zs characters from the earliest permissible version of Unicode, respectively 3.0 and 5.1.0.)
I copied the string from the Travis CI stack trace (latest io.js, but it was the same for a couple other versions of Node I looked at), showing the incorrectly trimmed test string, and ran it through rishida's Unicode converter:
This makes it clear that the problem was the failure to trim |
@@ -4,16 +4,18 @@ describe('String', function () { | |||
'use strict'; | |||
|
|||
describe('trim', function () { | |||
var test = '\x09\x0A\x0B\x0C\x0D\x20\xA0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u202F\u205F\u3000\u2028\u2029\uFEFFHello, World!\x09\x0A\x0B\x0C\x0D\x20\xA0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u202F\u205F\u3000\u2028\u2029\uFEFF'; | |||
var wsp = '\t\n\v\f\r \xA0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005' + | |||
'\u2006\u2007\u2008\u2009\u200A\u200B\u2028\u2029\u202F\u205F\u3000\uFEFF'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave this on one line - arbitrary line length limits are silly. it will show the diff better also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this sentiment; the reason I had broken it up was that I thought the project had line-length limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it used to, it doesn't anymore :-D
Please rebase this on top of master - merge commits make bisect sad - note: this means |
@lewisje would you mind checking "allow edits" on the RHS of the PR? |
Test for incorrect trimming of two whitespace characters instead of just one, reduce the number of string concatenations, and more quickly determine
hasTrimWhitespaceBug
to be false in the usual pre-ES5 case.I would change
if (typeof this === 'undefined' || this === null)
toif (this == null)
but I don't know whether that's allowed by this library's style guide.