improve object/jstring parser#452
Conversation
|
I'm planning to do old-aeson/new-aeson change to |
| if (unidata <= 0xDC00 || unidata >= 0xDFFF) // is not low surrogate | ||
| return -1; | ||
| surrogate = 0; | ||
| } else if (unidata >= 0xD800 && unidata <= 0xDBFF ) { // is high surrogate |
There was a problem hiding this comment.
Seems like bad indentation here. (I mean line number 133).
There was a problem hiding this comment.
Yes, indeed, i'll fix that later.
|
I'd like to drop |
|
Are you sure it will improve a lot? Even the |
|
Yes, we need benchmark to be sure, but i'm quite sure it will improve a lot, though attoparsec try very hard to minimize the overhead of appending buffer(which is need by backtrack even if you don't use it). |
|
I think the author of |
| else if (decode(&state, &codepoint, *s++) != UTF8_ACCEPT) { | ||
| if (state == UTF8_REJECT) { return -1; } | ||
| continue; | ||
| } |
There was a problem hiding this comment.
I am probably nitpicking here (looking at my own code ;)), but I think the above could accept an invalid UTF-8 input by skipping the incorrect characters and then start decoding non-sense. The (*s <= 127) branch should either be eliminated or the second branch should live in its own while (s < srcend) cycle.
There was a problem hiding this comment.
Yes, indeed. I checked the code in text as a reference, i don't know if the (*s <= 127) branch helps performance, but it seems it's ok to drop it, i'll try to make a benchmark to decide if you want to keep this branch.
|
Yes, |
merge from master
|
I don't have enough time to finish the non backtracking experiment quickly, so i leave the code as it now. |
|
@winterland1989 let's continue here instead of #392. I was referring to your comment
Which benchmarks did you mean? |
|
I'm meant to change |
This PR improves aeson's decoding performance significantly with following change:
fromListfromData.HashMap.StrictbecausefromListdo in-place update.json-stream, which pass all test-suit.fromListNfromData.Vector, this change improve overall benchmark a little bit.before:
after