Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite aeson with binary-parsers #466

Closed
winterland1989 opened this issue Sep 23, 2016 · 21 comments
Closed

Rewrite aeson with binary-parsers #466

winterland1989 opened this issue Sep 23, 2016 · 21 comments

Comments

@winterland1989
Copy link
Contributor

winterland1989 commented Sep 23, 2016

EDIT: For people looking for benchmarks, i've done this experiment in #467. Please review.

Recently i release a parsing library based on binary, it's incredible fast which made me re-read binary's source code, which lead to several further optimizations later.

From the benchmark, performance are improved by 10~30% with new parser, and i would expect more improvement on aeson's master, since binary-parsers provide a new scan combinator which i haven't used in benchmark yet:

-- | Similar to 'scan', but working on 'ByteString' chunks, The predicate
-- consumes a 'ByteString' chunk and transforms a state argument,
-- and each transformed state is passed to successive invocations of
-- the predicate on each chunk of the input until one chunk got splited to
-- @Right (ByteString, ByteString)@ or the input ends.
--
scanChunks :: s -> Consume s -> Get ByteString

My plan is to using this combinator with a C version scan function to find string's double quote end.

But this whole thing will break compatibility badly since aeson expose attoparsec parsers through Data.Aeson.Parser module, so i'm asking you to make a judgement. Should i do this in aeson, or should i just roll a new package? I definitely don't want to split a package just for a new parser.

@hvr
Copy link
Member

hvr commented Sep 23, 2016

Would the new parser be as easily consumable in a streaming mode like attoparsec ones are (c.f. http://hackage.haskell.org/package/io-streams-1.3.5.0/docs/System-IO-Streams-Attoparsec.html)?

@winterland1989
Copy link
Contributor Author

winterland1989 commented Sep 23, 2016

Yes, definitely. Binary support streaming perfectly with Decoder just like attoparsec's IResult. I'll going to update my wire-streams package to support binary only now, becasue an obvious performance issue of binary has been addressed in binary-parsers.

@Yuras
Copy link
Contributor

Yuras commented Sep 23, 2016

@winterland1989 it attoparsec provides something like takeChunk :: Parse ByteString (which returns the unconsumed input without asking for more input), then will it be possible to build attoparsec parser from binary-parser one? In that case aeson can export it for compatibility.

@winterland1989
Copy link
Contributor Author

Good idea, it occurred to me maybe takeLazyByteString can provide a way to make this isomorphism, we just ask atto to return remaining input as lazy bytestring and feed it into a binary parser. Then all we have to do is providing a way to inject final Decoder into a Parser. But:

  1. I'm not sure it's doable.
  2. I'm not sure if it affect the performance.

@mgsloan
Copy link

mgsloan commented Sep 25, 2016

@winterland1989 Have you seen the store library? It is highly optimized. I would be very surprised to see any circumstances where binary-parsers is faster.

@winterland1989
Copy link
Contributor Author

winterland1989 commented Sep 25, 2016

Hi @mgsloan, I checked store just after its release : ) but i don't think store is up to the task here, for example how can i provide these combinators with store?

IMHO, store is trying to make a better Storable story rather than deal with custom protocols, so maybe it's better to compare it to derive-storable or upcoming compact region?

@mgsloan
Copy link

mgsloan commented Sep 25, 2016

You could certainly provide those combinators by constructing a monad atop the Poke and Peek monads. No need to re-invent the full stack of binary serialization. I have plans to do so, but haven't gotten around to it quite yet. There is lots more to build atop the store abstractions that are not yet built. But feel free to do your own thing, of course. I am just puzzled why you would propose that a venerable package like aeson adopt your new package

@winterland1989
Copy link
Contributor Author

winterland1989 commented Sep 25, 2016

Because i care for the performance as much as you do ; )

Whether or not to adopt my package is the whole point of this discussion, if you think it's a wrong thing to do, i'd like to hear more about what makes you think so.

That been said, i don't think Peek is capable enough to provide tools to deal with parsing:

  1. Peek is built around IO which is not suitable in pure parsing context.
  2. Peek doesn't deal with partial input, which makes streaming parsing hard.

@winterland1989
Copy link
Contributor Author

For people looking for benchmarks, i've done this experiment in #467. Please review.

@tolysz
Copy link
Contributor

tolysz commented Sep 27, 2016

I just hope all new dependencies will build with ghcjs

@winterland1989
Copy link
Contributor Author

@tolysz: we will pull bytestring-lexing by pullingbinary-parsers, everything else is already there. Is bytestring-lexing buildable under ghcjs?

@tolysz
Copy link
Contributor

tolysz commented Sep 27, 2016

bytestring-lexing builds.

@winterland1989
Copy link
Contributor Author

Then this should build.

@mgsloan
Copy link

mgsloan commented Sep 27, 2016

Cool, seems reasonable, @winterland1989 ! I think Peek being built around IO is fine. That's what it takes to do raw memory access. We can safely wrap in an unsafePerformIO. It is better to have 1 unsafePerformIO than multiple, because IIRC there is some cost at runtime.

@winterland1989
Copy link
Contributor Author

Well, i actually have some questions about safety of the unsafePerformIO in store's encode/decode, if you don't mind: The Peek is MonadIO is not restrict enough to prevent people misuse it, e.g. if you build a combinator library on top of it, then there should be a way to ensure user isn't doing something too crazy, because suddenly all print/readFile.. became available.

@mgsloan
Copy link

mgsloan commented Sep 28, 2016

Good point! I've opened mgsloan/store#76 about it. You only can do that if you import Data.Store.Core, which should be unusual for most users of Store - you don't usually directly use the Peek / Poke primitives.

@bergmark
Copy link
Collaborator

Thank you @winterland1989!

Adding new dependencies is something we have always been careful about for several reasons such as stability of the libs apis, bus number, introduction of another possible point of failure... So please excuse my hesitation as my gut reaction is "maybe later".

But I don't think I should make this decision by myself, I'd very much like to hear what others think about all of this.

Cheers

@winterland1989
Copy link
Contributor Author

Yes, I fully understand this, that's why I am asking everyone here, please take your time to evaluate this change.

@phadej
Copy link
Collaborator

phadej commented Sep 29, 2016

FWIW, with Backpack we could have implementation against both, attoparsec and binary-parsers. Judging from the pull request #467, the needed interface is quite small.

cc @ezyang

@ezyang
Copy link

ezyang commented Sep 29, 2016

Yes, Backpack would help. Just not immediately (since it's not released yet.)

@phadej
Copy link
Collaborator

phadej commented Jun 19, 2023

The Data.Aeson.Decoding supersedes this. It's definitely over 10-30% faster than attoparsec, so i'd say the goal is achieved.

@phadej phadej closed this as completed Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants