-
Notifications
You must be signed in to change notification settings - Fork 284
Reimplement code/value deserialization based on a custom parser #5943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
Uses a fixed strict ByteString for the input and a PrimVar for the current index, and primitive monads (ST or IO) to do operations and yield results. This allows the parser to just return the answers, which seems to allow for better overall optimization than plumbing the parse state through in a State-like manner. In the latter, it is very sensitive to get the compiler to actually unbox all the results/state returned reliably. The downside is that the entire byte buffer must be in memory (since it's a strict bytestring). But that probably isn't a big deal for serialized unison values.
Includes some assorted likely-optimizations.
The fixed word getters were reading fixed positions in the bytestring rather than based on the index. Also tweaked the getVarInt implementation a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, nice!! I am fine with merging this once CI passes.
where | ||
grab as (n :: Int) | ||
| n <= 0 = evaluated $ reverse as | ||
| otherwise = ga >>= \a -> grab (a : as) (n - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if it makes any difference if you just define this directly? Like just directly create a Get m [a]
via a ByteString -> Ix m -> m [a]
.
Also it seems like you should build up the list strictly. You don't want to build up n levels of thunks, only to have them get forced... or is strictness analysis taking care of this?
Another option is just allocate a mutable array once you know the length, then convert to a list at the end.
The other list / map decoders could be done similarly if any of the above is a performance boost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yeah, having custom loops for things is a little better. I think it must not completely unwrap the
Get
abstraction in the recursive case, so it's sort of a manual wrapper transformation. - Evaluating the list argument is just slower, because the argument is a constructor each time. It's already evaluated, and there's just extra code that effectively does nothing.
- Same as my other tests with arrays, the problem is that which of lists and arrays is slower depends on the size, and you need thousands of elements for arrays to win significantly. One idea I had that might be worthwhile is having multiple possible representations and choosing which to use based on the size. So, if you're deserializing a list of <500 elements you use lists, and otherwise you use arrays or something. But, whether that will have any practical benefit depends on what sorts of things are actually commonly serialized (like, are we serializing huge sequences). And for the lists used to represent data types, lists basically always win.
Some errors compiling the tests:
|
I usually catch these with |
Allows some factoring out and improved definitions.
This implements a binary parser type and rebuilds the deserialization of values based on it (the changes necessary to the actual deserializers are fairly minimal).
The parser monad is based on ideas @pchiusano showed me. A parser is just a function that accepts a (strict) bytestring and a mutable 'current index' ref, and produces a result in either
IO
orST
. There are a few advantages of this.getVarInt
as a primitive. We use this extensively in our formats, so it's good for it to be as fast as possible. It doesn't seem to be possible to write new low level operations in e.g.binary
, and becausegetVarInt
is recursive, writing it based ongetWord8
doesn't seem to optimize well because of the previous point.getArray
operations that create, fill and freeze a mutable array, rather than building an intermediate list.There may be some minor pessimizations relative to
binary
. Like, reading multiple fixed size things in a row withbinary
has some rewrite rules to coalesce the relevant bounds checks. I haven't tried to do anything like that, because it doesn't seem like we actually do much stuff where that would matter (at least not repeatedly).On some simple test cases, the new parser is ~2x faster than the old one. I think most of that is due to the new implementation approach, although I did add some pragmas to help optimize better than before (a few
INLINE
s, and moreINLINABLE
s, the latter of which are intended to allow specialization to e.g.IO
/ST
rather than actual inlining). Some microbenchmarks had results better than that, but it's harder to realize in more complex situations, and I haven't spent an enormous amount of time fiddling with the particular serializers.