Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querystring: fix state machine in querystring.parse #11171

Closed
wants to merge 2 commits into from

Conversation

watilde
Copy link
Member

@watilde watilde commented Feb 4, 2017

From this comment: #10454 (comment). The posIdx should save the current position instead of the last position, and I've added the test cases.

Fixed cases:

  • a&&b => { 'a': '', 'b': '' }
  • a=a&&b=b => { 'a': 'a', 'b': 'b' }

Fixes #10454

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

querystring

- posIdx should save the current position instead of the last position.

Fixed cases:
- `a&&b` => `{ 'a': '', 'b': '' }`
- `a=a&&b=b` => `{ 'a': 'a', 'b': 'b' }`

Fixes nodejs#10454
@nodejs-github-bot nodejs-github-bot added the querystring Issues and PRs related to the built-in querystring module. label Feb 4, 2017
@TimothyGu
Copy link
Member

Some more bad cases, even after the patch (sorry...):

> querystring.parse('&a')
{ '': '', a: '' }
// { a: '' }
> querystring.parse('&=')
{ '': [ '', '' ] }
// { '': '' }
> querystring.parse('a&a&')
{ a: '' }
// { a: [ '', '' ] }
> querystring.parse('a&a&a&')
{ a: '' }
// { a: [ '', '', '' ] }
> querystring.parse('a&a&a&a&')
{ a: '' }
// { a: [ '', '', '', '' ] }

@watilde
Copy link
Member Author

watilde commented Feb 4, 2017

@TimothyGu That's a great catch! Thanks<3 I've added a commit to supporting the cases.
The problem was the logic didn't add the value to obj when curValue is '' in else if (curValue). It should be else. Plus, I noticed that it became unnecessary to evaluate i directly and also delete obj[key], because of the logic updated with the first commit in this PR :)

@mscdex
Copy link
Contributor

mscdex commented Feb 5, 2017

This is why I wanted to take a look at the original PR before landing as the delete should never have been necessary in the first place.

On a more general note, I'm not particularly keen on changing querystring.parse() (and slowing it down) just to satisfy URLSearchParams. URLSearchParams should just have its own implementation if it requires different behavior.

@joyeecheung
Copy link
Member

joyeecheung commented Feb 6, 2017

I think the fastest way we can fix this in URLSearchParams without affecting querystring is to simply copy the files over, and then make changes in that copy? We can completely reimplement the querystring part later, but right now we are focusing on spec compliance so this is the fastest way to catch up on the spec(not necessarily the best way for sure).

cc @jasnell

@joyeecheung
Copy link
Member

Also if we do that, when we start to redo the parsing/serialization bit of URLSearchParams we will be refactoring a spec-complaint fork of querystring instead of reimplementing it from scratch.

@domenic
Copy link
Contributor

domenic commented Feb 6, 2017

It does seem like creating a separate spec-complaint parser for URL queries might be a good path. The algorithm doesn't seem that hard to just implement from scratch: https://url.spec.whatwg.org/#concept-urlencoded-parser

(Although, the spec's pipeline of JS string --UTF-8 encode--> bytes --percent decode--> bytes --UTF-8 decode without BOM--> strings seems a bit inefficient, and hopefully you can the same results by just staying in string-land.)

@jasnell
Copy link
Member

jasnell commented Feb 6, 2017

yeah, I'll look at creating a separate parser this next week.

@TimothyGu
Copy link
Member

@jasnell, you might find my C++ implementation to be interesting, since the spec is fairly declarative on what needs to be done. Though performance-wise I think a JS implementation might be faster.

@mscdex
Copy link
Contributor

mscdex commented Feb 6, 2017

I've benchmarked the combined changes in #10967 and this PR against master prior to either of these commits, and on at least one of the benchmark combinations (multicharsep) there is ~5.5% performance regression with a high "confidence" rating.

I am working on and benchmarking an alternative solution now which should hopefully avoid the performance regression(s).

@stevenvachon
Copy link

stevenvachon commented Feb 7, 2017

Another useful test case:

> require("querystring").parse("a=&a=value&a=")
{ a: [ '', '' ] }
// { a: [ '', 'value', '' ] }

… and already solved by this patch.

@mscdex
Copy link
Contributor

mscdex commented Feb 7, 2017

Alright, I've tested my much simpler changes which should cover the needs of the original PR and this PR, including the tests they introduce (and @stevenvachon's test), and after adjusting the benchmarks to allow the function "priming" to occur more than once to ensure any re-optimizing ("definitely") stays outside of the timed loop, there does not seem to be any high "confidence" negative regressions anymore. Also just judging by this new code change alone, I'm confident that the code will not negatively affect the other code paths.

I will submit a PR as soon as the more thorough benchmarking that I currently have running finishes and agrees with my preliminary results.

@watilde
Copy link
Member Author

watilde commented Feb 7, 2017

@mscdex That sounds great! I have a keen interest in it, let's see what will happen :)
Probably, this PR will be closed depending on your patch.

@mscdex
Copy link
Contributor

mscdex commented Feb 8, 2017

My proposed solution has now been pushed: #11234

@jasnell
Copy link
Member

jasnell commented Feb 8, 2017

@mscdex ... to be clear, #11234 is an alternative to this one, yes?

@watilde
Copy link
Member Author

watilde commented Feb 8, 2017

I think so, and that patch was great! I will close this to jump to #11234. Thanks 😃

@watilde watilde closed this Feb 8, 2017
@watilde watilde deleted the feature/query branch February 8, 2017 18:16
@mscdex
Copy link
Contributor

mscdex commented Feb 8, 2017

@jasnell Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
querystring Issues and PRs related to the built-in querystring module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants