Skip to content

Conversation

dolio
Copy link
Contributor

@dolio dolio commented Oct 17, 2025

This implements a binary parser type and rebuilds the deserialization of values based on it (the changes necessary to the actual deserializers are fairly minimal).

The parser monad is based on ideas @pchiusano showed me. A parser is just a function that accepts a (strict) bytestring and a mutable 'current index' ref, and produces a result in either IO or ST. There are a few advantages of this.

  • It seems like the optimizer has an easier time dealing with this. With a normal 'threaded state' approach, I think it has trouble unboxing things like the result state and value for recursive parsers. This is because (I think) it involves unpacking multiple layers of tuples via a worker-wrapper transform. GHC just doesn't seem to reliably get this done. For this new parser type, only the result value needs to be unpacked, which will not be a tuple when such unpacking is fruitful.
  • I can directly implement getVarInt as a primitive. We use this extensively in our formats, so it's good for it to be as fast as possible. It doesn't seem to be possible to write new low level operations in e.g. binary, and because getVarInt is recursive, writing it based on getWord8 doesn't seem to optimize well because of the previous point.
  • It's possible to provide facilities to write getArray operations that create, fill and freeze a mutable array, rather than building an intermediate list.

There may be some minor pessimizations relative to binary. Like, reading multiple fixed size things in a row with binary has some rewrite rules to coalesce the relevant bounds checks. I haven't tried to do anything like that, because it doesn't seem like we actually do much stuff where that would matter (at least not repeatedly).

On some simple test cases, the new parser is ~2x faster than the old one. I think most of that is due to the new implementation approach, although I did add some pragmas to help optimize better than before (a few INLINEs, and more INLINABLEs, the latter of which are intended to allow specialization to e.g. IO/ST rather than actual inlining). Some microbenchmarks had results better than that, but it's harder to realize in more complex situations, and I haven't spent an enormous amount of time fiddling with the particular serializers.

dolio added 3 commits October 17, 2025 17:12
Uses a fixed strict ByteString for the input and a PrimVar for
the current index, and primitive monads (ST or IO) to do
operations and yield results. This allows the parser to just
return the answers, which seems to allow for better overall
optimization than plumbing the parse state through in a
State-like manner. In the latter, it is very sensitive to get the
compiler to actually unbox all the results/state returned
reliably.

The downside is that the entire byte buffer must be in memory
(since it's a strict bytestring). But that probably isn't a big
deal for serialized unison values.
Includes some assorted likely-optimizations.
The fixed word getters were reading fixed positions in the
bytestring rather than based on the index.

Also tweaked the getVarInt implementation a bit.
@dolio dolio requested review from aryairani and pchiusano October 17, 2025 22:35
Copy link
Member

@pchiusano pchiusano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice!! I am fine with merging this once CI passes.

Comment on lines 179 to 182
where
grab as (n :: Int)
| n <= 0 = evaluated $ reverse as
| otherwise = ga >>= \a -> grab (a : as) (n - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if it makes any difference if you just define this directly? Like just directly create a Get m [a] via a ByteString -> Ix m -> m [a].

Also it seems like you should build up the list strictly. You don't want to build up n levels of thunks, only to have them get forced... or is strictness analysis taking care of this?

Another option is just allocate a mutable array once you know the length, then convert to a list at the end.

The other list / map decoders could be done similarly if any of the above is a performance boost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah, having custom loops for things is a little better. I think it must not completely unwrap the Get abstraction in the recursive case, so it's sort of a manual wrapper transformation.
  2. Evaluating the list argument is just slower, because the argument is a constructor each time. It's already evaluated, and there's just extra code that effectively does nothing.
  3. Same as my other tests with arrays, the problem is that which of lists and arrays is slower depends on the size, and you need thousands of elements for arrays to win significantly. One idea I had that might be worthwhile is having multiple possible representations and choosing which to use based on the size. So, if you're deserializing a list of <500 elements you use lists, and otherwise you use arrays or something. But, whether that will have any practical benefit depends on what sorts of things are actually commonly serialized (like, are we serializing huge sequences). And for the lists used to represent data types, lists basically always win.

@aryairani
Copy link
Contributor

Some errors compiling the tests:

       /Users/arya/work/unison/deserial-perf/unison-runtime/tests/Unison/Test/Runtime/ANF/Serialization.hs:98:31: error: [GHC-83865]
           • Couldn't match type: (Version, Bool)
                                  -> Unison.Runtime.Serialize.Get.Get m0 (Value Reference)
                            with: transformers-0.6.1.0:Control.Monad.Trans.Reader.ReaderT
                                    (Version, Bool) Get (Value Reference)
             Expected: transformers-0.6.1.0:Control.Monad.Trans.Reader.ReaderT
                         (Version, Bool) Get (Value Reference)
               Actual: GDeserial m0 (Value Reference)
             Type synonyms expanded:
             Expected type: transformers-0.6.1.0:Control.Monad.Trans.Reader.ReaderT
                              (Version, Bool) Get (Value Reference)
               Actual type: (Version, Bool)
                            -> Unison.Runtime.Serialize.Get.Get m0 (Value Reference)
           • Probable cause: ‘getValue’ is applied to too few arguments
             In the first argument of ‘runReaderT’, namely ‘getValue’
             In the first argument of ‘(.)’, namely ‘runReaderT getValue’
             In the first argument of ‘getPutRoundtrip’, namely
               ‘(runReaderT getValue . (, False))’
          |
       98 |   getPutRoundtrip (runReaderT getValue . (,False)) putValue genValue
          |                               ^^^^^^^^
       
       /Users/arya/work/unison/deserial-perf/unison-runtime/tests/Unison/Test/Runtime/MCode/Serialization.hs:192:19: error: [GHC-83865]
           • Couldn't match expected type: Get StoredCache
                         with actual type: Unison.Runtime.Serialize.Get.Get m0 StoredCache
             NB: ‘Get’
                   is defined in ‘Data.Serialize.Get’ in package ‘cereal-0.5.8.3’
                 ‘Unison.Runtime.Serialize.Get.Get’
                   is defined in ‘Unison.Runtime.Serialize.Get’
                       in package ‘unison-runtime-0.0.0’
           • In the first argument of ‘getPutRoundtrip’, namely
               ‘getStoredCache’
             In the expression:
               getPutRoundtrip getStoredCache (putStoredCache) genStoredCache
             In an equation for ‘sCacheRoundtrip’:
                 sCacheRoundtrip
                   = getPutRoundtrip getStoredCache (putStoredCache) genStoredCache
           |
       192 |   getPutRoundtrip getStoredCache (putStoredCache) genStoredCache

@aryairani
Copy link
Contributor

I usually catch these with ./scripts/check.sh since I don't typically think to build with tests when doing normal development

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants