-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve object/jstring parser #452
Conversation
I'm planning to do old-aeson/new-aeson change to |
if (unidata <= 0xDC00 || unidata >= 0xDFFF) // is not low surrogate | ||
return -1; | ||
surrogate = 0; | ||
} else if (unidata >= 0xD800 && unidata <= 0xDBFF ) { // is high surrogate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like bad indentation here. (I mean line number 133).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, i'll fix that later.
I'd like to drop |
Are you sure it will improve a lot? Even the |
Yes, we need benchmark to be sure, but i'm quite sure it will improve a lot, though attoparsec try very hard to minimize the overhead of appending buffer(which is need by backtrack even if you don't use it). |
I think the author of |
else if (decode(&state, &codepoint, *s++) != UTF8_ACCEPT) { | ||
if (state == UTF8_REJECT) { return -1; } | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably nitpicking here (looking at my own code ;)), but I think the above could accept an invalid UTF-8 input by skipping the incorrect characters and then start decoding non-sense. The (*s <= 127)
branch should either be eliminated or the second branch should live in its own while (s < srcend)
cycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. I checked the code in text as a reference, i don't know if the (*s <= 127)
branch helps performance, but it seems it's ok to drop it, i'll try to make a benchmark to decide if you want to keep this branch.
Yes, |
merge from master
I don't have enough time to finish the non backtracking experiment quickly, so i leave the code as it now. |
@winterland1989 let's continue here instead of #392. I was referring to your comment
Which benchmarks did you mean? |
I'm meant to change |
This PR improves aeson's decoding performance significantly with following change:
fromList
fromData.HashMap.Strict
becausefromList
do in-place update.json-stream
, which pass all test-suit.fromListN
fromData.Vector
, this change improve overall benchmark a little bit.before:
after