improve object/jstring parser #452

winterland1989 · 2016-07-15T11:31:12Z

This PR improves aeson's decoding performance significantly with following change:

use fromList from Data.HashMap.Strict because fromList do in-place update.
use FFI to parse and unescape JSON string, the code is borrowed from json-stream, which pass all test-suit.
use fromListN from Data.Vector, this change improve overall benchmark a little bit.

before:

    json-data/twitter1.json :: 60000 times
      1.485 seconds, 40407 parses/sec, 32.948 MB/sec
    json-data/twitter1.json :: 60000 times
      1.379 seconds, 43498 parses/sec, 35.468 MB/sec
    json-data/twitter1.json :: 60000 times
      1.402 seconds, 42807 parses/sec, 34.905 MB/sec
0.8 KB: 43498 msg\/sec (35.5 MB\/sec)
    json-data/twitter10.json :: 13000 times
      1.741 seconds, 7468 parses/sec, 46.969 MB/sec
    json-data/twitter10.json :: 13000 times
      1.916 seconds, 6785 parses/sec, 42.672 MB/sec
    json-data/twitter10.json :: 13000 times
      1.776 seconds, 7318 parses/sec, 46.025 MB/sec
6.4 KB: 7469 msg\/sec (47.0 MB\/sec)
    json-data/twitter20.json :: 7500 times
      2.441 seconds, 3072 parses/sec, 35.360 MB/sec
    json-data/twitter20.json :: 7500 times
      2.573 seconds, 2914 parses/sec, 33.544 MB/sec
    json-data/twitter20.json :: 7500 times
      2.178 seconds, 3443 parses/sec, 39.631 MB/sec
11.8 KB: 3444 msg\/sec (39.6 MB\/sec)
    json-data/twitter50.json :: 2500 times
      2.098 seconds, 1191 parses/sec, 36.311 MB/sec
    json-data/twitter50.json :: 2500 times
      2.001 seconds, 1249 parses/sec, 38.059 MB/sec
    json-data/twitter50.json :: 2500 times
      2.475 seconds, 1010 parses/sec, 30.778 MB/sec
31.2 KB: 1249 msg\/sec (38.1 MB\/sec)
    json-data/twitter100.json :: 1000 times
      1.917 seconds, 521 parses/sec, 31.341 MB/sec
    json-data/twitter100.json :: 1000 times
      1.768 seconds, 565 parses/sec, 33.979 MB/sec
    json-data/twitter100.json :: 1000 times
      1.867 seconds, 535 parses/sec, 32.182 MB/sec
61.5 KB: 566 msg\/sec (34.0 MB\/sec)
    json-data/jp10.json :: 4000 times
      1.309 seconds, 3055 parses/sec, 43.657 MB/sec
    json-data/jp10.json :: 4000 times
      1.337 seconds, 2992 parses/sec, 42.754 MB/sec
    json-data/jp10.json :: 4000 times
      1.350 seconds, 2962 parses/sec, 42.334 MB/sec
14.6 KB: 3055 msg\/sec (43.7 MB\/sec)
    json-data/jp50.json :: 1200 times
      1.330 seconds, 902 parses/sec, 38.843 MB/sec
    json-data/jp50.json :: 1200 times
      1.440 seconds, 833 parses/sec, 35.878 MB/sec
    json-data/jp50.json :: 1200 times
      1.334 seconds, 899 parses/sec, 38.722 MB/sec
44.1 KB: 902 msg\/sec (38.8 MB\/sec)
    json-data/jp100.json :: 700 times
      1.655 seconds, 423 parses/sec, 34.247 MB/sec
    json-data/jp100.json :: 700 times
      1.541 seconds, 454 parses/sec, 36.769 MB/sec
    json-data/jp100.json :: 700 times
      1.565 seconds, 447 parses/sec, 36.202 MB/sec
82.9 KB: 454 msg\/sec (36.8 MB\/sec)

after

    json-data/twitter1.json :: 60000 times
      1.114 seconds, 53859 parses/sec, 43.917 MB/sec
    json-data/twitter1.json :: 60000 times
      1.191 seconds, 50373 parses/sec, 41.074 MB/sec
    json-data/twitter1.json :: 60000 times
      1.130 seconds, 53110 parses/sec, 43.306 MB/sec
0.8 KB: 53860 msg\/sec (43.9 MB\/sec)
    json-data/twitter10.json :: 13000 times
      1.500 seconds, 8667 parses/sec, 54.506 MB/sec
    json-data/twitter10.json :: 13000 times
      1.515 seconds, 8579 parses/sec, 53.952 MB/sec
    json-data/twitter10.json :: 13000 times
      1.520 seconds, 8550 parses/sec, 53.768 MB/sec
6.4 KB: 8668 msg\/sec (54.5 MB\/sec)
    json-data/twitter20.json :: 7500 times
      1.922 seconds, 3902 parses/sec, 44.910 MB/sec
    json-data/twitter20.json :: 7500 times
      1.919 seconds, 3907 parses/sec, 44.971 MB/sec
    json-data/twitter20.json :: 7500 times
      1.845 seconds, 4063 parses/sec, 46.772 MB/sec
11.8 KB: 4064 msg\/sec (46.8 MB\/sec)
    json-data/twitter50.json :: 2500 times
      1.725 seconds, 1448 parses/sec, 44.149 MB/sec
    json-data/twitter50.json :: 2500 times
      1.728 seconds, 1447 parses/sec, 44.092 MB/sec
    json-data/twitter50.json :: 2500 times
      1.747 seconds, 1431 parses/sec, 43.610 MB/sec
31.2 KB: 1449 msg\/sec (44.1 MB\/sec)
    json-data/twitter100.json :: 1000 times
      1.466 seconds, 682 parses/sec, 40.978 MB/sec
    json-data/twitter100.json :: 1000 times
      1.534 seconds, 652 parses/sec, 39.171 MB/sec
    json-data/twitter100.json :: 1000 times
      1.443 seconds, 692 parses/sec, 41.621 MB/sec
61.5 KB: 693 msg\/sec (41.6 MB\/sec)
    json-data/jp10.json :: 4000 times
      0.866 seconds, 4619 parses/sec, 66.009 MB/sec
    json-data/jp10.json :: 4000 times
      0.856 seconds, 4672 parses/sec, 66.761 MB/sec
    json-data/jp10.json :: 4000 times
      0.867 seconds, 4615 parses/sec, 65.950 MB/sec
14.6 KB: 4672 msg\/sec (66.8 MB\/sec)
    json-data/jp50.json :: 1200 times
      0.973 seconds, 1233 parses/sec, 53.099 MB/sec
    json-data/jp50.json :: 1200 times
      0.948 seconds, 1265 parses/sec, 54.498 MB/sec
    json-data/jp50.json :: 1200 times
      0.938 seconds, 1278 parses/sec, 55.045 MB/sec
44.1 KB: 1279 msg\/sec (55.0 MB\/sec)
    json-data/jp100.json :: 700 times
      1.145 seconds, 611 parses/sec, 49.500 MB/sec
    json-data/jp100.json :: 700 times
      1.131 seconds, 618 parses/sec, 50.096 MB/sec
    json-data/jp100.json :: 700 times
      1.157 seconds, 604 parses/sec, 48.969 MB/sec
82.9 KB: 619 msg\/sec (50.1 MB\/sec)

phadej · 2016-07-15T11:35:51Z

I'm planning to do old-aeson/new-aeson change to Typed benchmark, as in #447. So we can see the difference better.

ondrap · 2016-07-15T11:49:54Z

cbits/unescape_string.c

+      if (unidata <= 0xDC00 || unidata >= 0xDFFF) // is not low surrogate
+        return -1;
+        surrogate = 0;
+    } else if (unidata >= 0xD800 && unidata <= 0xDBFF ) { // is high surrogate


Seems like bad indentation here. (I mean line number 133).

Yes, indeed, i'll fix that later.

winterland1989 · 2016-07-15T11:58:00Z

I'm planning to do old-aeson/new-aeson change to Typed benchmark, as in #447. So we can see the difference better.

@phadej Can you tell me what changes should i make?

winterland1989 · 2016-07-18T09:10:17Z

I'd like to dropattoparsec in favor of non-backtrace parsers to further optimize parsing speed, attoparse is really not needed here, but that will break existing code using Data.Aeson.Parser, we need a discuss here.

phadej · 2016-07-18T09:35:17Z

Are you sure it will improve a lot? Even the attoparsec is backtracking, we don't backtrack in aeson. Yet benchmark result would prove/disproof this

winterland1989 · 2016-07-18T10:43:15Z

Yes, we need benchmark to be sure, but i'm quite sure it will improve a lot, though attoparsec try very hard to minimize the overhead of appending buffer(which is need by backtrack even if you don't use it).
I will post the benchmark figures before further commits.

ondrap · 2016-07-18T10:50:50Z

I think the author of scanner mentioned somewhere that he has a json parser and that it is ultimately faster.

ondrap · 2016-07-18T10:53:53Z

cbits/unescape_string.c

+        else if (decode(&state, &codepoint, *s++) != UTF8_ACCEPT) {
+          if (state == UTF8_REJECT) { return -1; }
+          continue;
+        }


I am probably nitpicking here (looking at my own code ;)), but I think the above could accept an invalid UTF-8 input by skipping the incorrect characters and then start decoding non-sense. The (*s <= 127) branch should either be eliminated or the second branch should live in its own while (s < srcend) cycle.

Yes, indeed. I checked the code in text as a reference, i don't know if the (*s <= 127) branch helps performance, but it seems it's ok to drop it, i'll try to make a benchmark to decide if you want to keep this branch.

winterland1989 · 2016-07-18T10:57:04Z

Yes, scanner is exactly what i'm using, stay turned, benchmark coming!

merge from master

winterland1989 · 2016-07-20T06:58:12Z

I don't have enough time to finish the non backtracking experiment quickly, so i leave the code as it now.

bergmark · 2016-08-07T12:49:45Z

@winterland1989 let's continue here instead of #392. I was referring to your comment

I will post the benchmark figures before further commits.

Which benchmarks did you mean?

winterland1989 · 2016-08-07T13:25:04Z

I'm meant to change attoparsec to something not backtracking, by saying benchmark i mean the benchmarks in master's repo. but i don't have time right now so i leave this task to future. Anyway current pull request's benchmark result has been post already, you can pull and run yourself.

improve object/jstring parser

2f24e55

winterland1989 mentioned this pull request Jul 15, 2016

Faster parser using FFI? #450

Closed

ondrap reviewed Jul 15, 2016
View reviewed changes

add copyright; fix indentation

997c1de

ondrap reviewed Jul 18, 2016
View reviewed changes

winterland1989 added 2 commits July 20, 2016 14:00

Merge pull request #1 from bos/master

2733775

merge from master

delete invalid branch in c

6d2e0a8

bergmark mentioned this pull request Jul 27, 2016

v1.0.0.0 planning #392

Closed

18 tasks

bergmark added this to the 1.0.1.0 milestone Aug 16, 2016

bergmark merged commit 6d2e0a8 into haskell:master Sep 16, 2016

winterland1989 mentioned this pull request Sep 27, 2016

try binary-parsers #467

Open

Lysxia mentioned this pull request Mar 28, 2017

Duplicate keys #531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve object/jstring parser #452

improve object/jstring parser #452

Uh oh!

winterland1989 commented Jul 15, 2016

Uh oh!

phadej commented Jul 15, 2016

Uh oh!

ondrap Jul 15, 2016 •

edited

Loading

Uh oh!

winterland1989 Jul 15, 2016

Uh oh!

winterland1989 commented Jul 15, 2016

Uh oh!

winterland1989 commented Jul 18, 2016

Uh oh!

phadej commented Jul 18, 2016

Uh oh!

winterland1989 commented Jul 18, 2016

Uh oh!

ondrap commented Jul 18, 2016

Uh oh!

ondrap Jul 18, 2016

Uh oh!

winterland1989 Jul 18, 2016

Uh oh!

winterland1989 commented Jul 18, 2016 •

edited

Loading

Uh oh!

winterland1989 commented Jul 20, 2016

Uh oh!

bergmark commented Aug 7, 2016

Uh oh!

winterland1989 commented Aug 7, 2016 •

edited

Loading

Uh oh!

Uh oh!

improve object/jstring parser #452

improve object/jstring parser #452

Uh oh!

Conversation

winterland1989 commented Jul 15, 2016

Uh oh!

phadej commented Jul 15, 2016

Uh oh!

ondrap Jul 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

winterland1989 Jul 15, 2016

Choose a reason for hiding this comment

Uh oh!

winterland1989 commented Jul 15, 2016

Uh oh!

winterland1989 commented Jul 18, 2016

Uh oh!

phadej commented Jul 18, 2016

Uh oh!

winterland1989 commented Jul 18, 2016

Uh oh!

ondrap commented Jul 18, 2016

Uh oh!

ondrap Jul 18, 2016

Choose a reason for hiding this comment

Uh oh!

winterland1989 Jul 18, 2016

Choose a reason for hiding this comment

Uh oh!

winterland1989 commented Jul 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

winterland1989 commented Jul 20, 2016

Uh oh!

bergmark commented Aug 7, 2016

Uh oh!

winterland1989 commented Aug 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ondrap Jul 15, 2016 •

edited

Loading

winterland1989 commented Jul 18, 2016 •

edited

Loading

winterland1989 commented Aug 7, 2016 •

edited

Loading