Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: prove equivalence with clojure.data.json #166

Open
vemv opened this issue Aug 10, 2020 · 6 comments
Open

Suggestion: prove equivalence with clojure.data.json #166

vemv opened this issue Aug 10, 2020 · 6 comments

Comments

@vemv
Copy link

vemv commented Aug 10, 2020

Hi!

It'd be interesting to prove that serialization/deserialization is analog to that of clojure.data.json, so that existing applications can switch implementations without fearing that values will suddenly have a slightly different format, etc.

jsonista does a cheshire (and cheshire-only) comparison here:

https://github.com/metosin/jsonista/blob/211306f04bb15d7232b536cf6c6d8ecfeae0512d/test/jsonista/core_test.clj#L56

It could be desirable to something like that, but comparing data.json<->cheshire (obviously, transitively one could also make an educated data.json<->cheshire<->jsonista comparison).

I see that c.d.j is already exercised here https://github.com/dakrone/cheshire/blob/4525b23da1c17decba363202402a8a195d21705f/benchmarks/cheshire/test/benchmark.clj , so it might be easy enough to piggy back on that test, adding some extra assertions.

Thanks - V

@borkdude
Copy link
Contributor

borkdude commented Mar 19, 2021

org.clojure/data.json 2.0.0 just came out with significant speed up. This was the announcement on Clojurians Slack:

This release introduces significant speed improvements in both reading and writing json, while still being a pure clojure lib with no external dependencies.
Using the benchmark data from jsonista we see the following improvement:
Reading:
10b from 1.4 µs to 609 ns (cheshire 995 ns)
100b from 4.6 µs to 2.4 µs (cheshire 1.9 µs)
1k from 26.2 µs to 13.3 µs (cheshire 10.2 µs)
10k from 292.6 µs to 157.3 µs (cheshire 93.1 µs)
100k from 2.8 ms to 1.5 ms (cheshire 918.2 µs)
Writing
10b from 2.3 µs to 590 ns (cheshire 1.0 µs)
100b from 7.3 µs to 2.7 µs (cheshire 2.5 µs)
1k from 41.3 µs to 14.3 µs (cheshire 9.4 µs)
10k from 508 µs to 161 µs (cheshire 105.3 µs)
100k from 4.4 ms to 1.5 ms (cheshire 1.17 ms)

Perhaps Cheshire can add more perf tweaks to always stay ahead of pure Clojure.

/cc @nilern

@nilern
Copy link
Contributor

nilern commented Mar 19, 2021

Seems like there are some fixed costs that slow down small parses...

@borkdude
Copy link
Contributor

From @slipset:

1. remove the dynamic vars and pass them explicitly as an options map
2. for reading, split reading strings into two paths, the quick one (without any escapes), you do with passing an array slice to (String.), the slow one (with escapes and unicode and stuff) you still do with Stringbuilder
3. for writing, don’t use format to construct unicode escapes

The main trick though was to use the stuff in  http://clojure-goes-fast.com
ie, profile, observe the results, form a hypothesis, create a fix 

@slipset
Copy link

slipset commented Mar 19, 2021

There seems to be a startup cost in using Jackson that jsonista seems to avoid. It might be that being able to maintain some sort of Jackson context in your app, and pass it to the various parse fns could speed things up quite a bit.

It seems though that most of the cost comes from assoc! which is hard to avoid without creating custom data types.

@borkdude
Copy link
Contributor

borkdude commented Mar 19, 2021

It is a known issue that the 3 arity version of assoc and assoc! are faster than their varargs counterparts, so using multiple assoc! with 1 kvs instead of one assoc! for multiple kvs could speed things up (although I don't see that being used in cheshire).

@nilern
Copy link
Contributor

nilern commented Mar 19, 2021

I would guess the cost is in (.createParser ^JsonFactory (or factory/*json-factory* factory/json-factory) ^Reader rdr).

I don't think the varargs assoc! applies. And if you want a standard map the assoc!ing has to be done eventually so... go optimize transients in core?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants