Reimplement code/value deserialization based on a custom parser #5943

dolio · 2025-10-17T22:35:00Z

This implements a binary parser type and rebuilds the deserialization of values based on it (the changes necessary to the actual deserializers are fairly minimal).

The parser monad is based on ideas @pchiusano showed me. A parser is just a function that accepts a (strict) bytestring and a mutable 'current index' ref, and produces a result in either IO or ST. There are a few advantages of this.

It seems like the optimizer has an easier time dealing with this. With a normal 'threaded state' approach, I think it has trouble unboxing things like the result state and value for recursive parsers. This is because (I think) it involves unpacking multiple layers of tuples via a worker-wrapper transform. GHC just doesn't seem to reliably get this done. For this new parser type, only the result value needs to be unpacked, which will not be a tuple when such unpacking is fruitful.
I can directly implement getVarInt as a primitive. We use this extensively in our formats, so it's good for it to be as fast as possible. It doesn't seem to be possible to write new low level operations in e.g. binary, and because getVarInt is recursive, writing it based on getWord8 doesn't seem to optimize well because of the previous point.
It's possible to provide facilities to write getArray operations that create, fill and freeze a mutable array, rather than building an intermediate list.

There may be some minor pessimizations relative to binary. Like, reading multiple fixed size things in a row with binary has some rewrite rules to coalesce the relevant bounds checks. I haven't tried to do anything like that, because it doesn't seem like we actually do much stuff where that would matter (at least not repeatedly).

On some simple test cases, the new parser is ~2x faster than the old one. I think most of that is due to the new implementation approach, although I did add some pragmas to help optimize better than before (a few INLINEs, and more INLINABLEs, the latter of which are intended to allow specialization to e.g. IO/ST rather than actual inlining). Some microbenchmarks had results better than that, but it's harder to realize in more complex situations, and I haven't spent an enormous amount of time fiddling with the particular serializers.

Uses a fixed strict ByteString for the input and a PrimVar for the current index, and primitive monads (ST or IO) to do operations and yield results. This allows the parser to just return the answers, which seems to allow for better overall optimization than plumbing the parse state through in a State-like manner. In the latter, it is very sensitive to get the compiler to actually unbox all the results/state returned reliably. The downside is that the entire byte buffer must be in memory (since it's a strict bytestring). But that probably isn't a big deal for serialized unison values.

Includes some assorted likely-optimizations.

The fixed word getters were reading fixed positions in the bytestring rather than based on the index. Also tweaked the getVarInt implementation a bit.

pchiusano

Wow, nice!! I am fine with merging this once CI passes.

pchiusano · 2025-10-17T23:00:29Z

unison-runtime/src/Unison/Runtime/Serialize.hs

+  where
+    grab as (n :: Int)
+      | n <= 0 = evaluated $ reverse as
+      | otherwise = ga >>= \a -> grab (a : as) (n - 1)


Do you know if it makes any difference if you just define this directly? Like just directly create a Get m [a] via a ByteString -> Ix m -> m [a].

Also it seems like you should build up the list strictly. You don't want to build up n levels of thunks, only to have them get forced... or is strictness analysis taking care of this?

Another option is just allocate a mutable array once you know the length, then convert to a list at the end.

The other list / map decoders could be done similarly if any of the above is a performance boost.

Yeah, having custom loops for things is a little better. I think it must not completely unwrap the Get abstraction in the recursive case, so it's sort of a manual wrapper transformation.

Evaluating the list argument is just slower, because the argument is a constructor each time. It's already evaluated, and there's just extra code that effectively does nothing.

Same as my other tests with arrays, the problem is that which of lists and arrays is slower depends on the size, and you need thousands of elements for arrays to win significantly. One idea I had that might be worthwhile is having multiple possible representations and choosing which to use based on the size. So, if you're deserializing a list of <500 elements you use lists, and otherwise you use arrays or something. But, whether that will have any practical benefit depends on what sorts of things are actually commonly serialized (like, are we serializing huge sequences). And for the lists used to represent data types, lists basically always win.

aryairani · 2025-10-18T00:41:29Z

Some errors compiling the tests:

       /Users/arya/work/unison/deserial-perf/unison-runtime/tests/Unison/Test/Runtime/ANF/Serialization.hs:98:31: error: [GHC-83865]
           • Couldn't match type: (Version, Bool)
                                  -> Unison.Runtime.Serialize.Get.Get m0 (Value Reference)
                            with: transformers-0.6.1.0:Control.Monad.Trans.Reader.ReaderT
                                    (Version, Bool) Get (Value Reference)
             Expected: transformers-0.6.1.0:Control.Monad.Trans.Reader.ReaderT
                         (Version, Bool) Get (Value Reference)
               Actual: GDeserial m0 (Value Reference)
             Type synonyms expanded:
             Expected type: transformers-0.6.1.0:Control.Monad.Trans.Reader.ReaderT
                              (Version, Bool) Get (Value Reference)
               Actual type: (Version, Bool)
                            -> Unison.Runtime.Serialize.Get.Get m0 (Value Reference)
           • Probable cause: ‘getValue’ is applied to too few arguments
             In the first argument of ‘runReaderT’, namely ‘getValue’
             In the first argument of ‘(.)’, namely ‘runReaderT getValue’
             In the first argument of ‘getPutRoundtrip’, namely
               ‘(runReaderT getValue . (, False))’
          |
       98 |   getPutRoundtrip (runReaderT getValue . (,False)) putValue genValue
          |                               ^^^^^^^^
       
       /Users/arya/work/unison/deserial-perf/unison-runtime/tests/Unison/Test/Runtime/MCode/Serialization.hs:192:19: error: [GHC-83865]
           • Couldn't match expected type: Get StoredCache
                         with actual type: Unison.Runtime.Serialize.Get.Get m0 StoredCache
             NB: ‘Get’
                   is defined in ‘Data.Serialize.Get’ in package ‘cereal-0.5.8.3’
                 ‘Unison.Runtime.Serialize.Get.Get’
                   is defined in ‘Unison.Runtime.Serialize.Get’
                       in package ‘unison-runtime-0.0.0’
           • In the first argument of ‘getPutRoundtrip’, namely
               ‘getStoredCache’
             In the expression:
               getPutRoundtrip getStoredCache (putStoredCache) genStoredCache
             In an equation for ‘sCacheRoundtrip’:
                 sCacheRoundtrip
                   = getPutRoundtrip getStoredCache (putStoredCache) genStoredCache
           |
       192 |   getPutRoundtrip getStoredCache (putStoredCache) genStoredCache

aryairani · 2025-10-18T03:58:25Z

I usually catch these with ./scripts/check.sh since I don't typically think to build with tests when doing normal development

Allows some factoring out and improved definitions.

dolio added 3 commits October 17, 2025 17:12

Rewrite serializers on top of custom Get

851d11e

Includes some assorted likely-optimizations.

Fix some Get implementations

b1bb534

The fixed word getters were reading fixed positions in the bytestring rather than based on the index. Also tweaked the getVarInt implementation a bit.

dolio requested review from aryairani and pchiusano October 17, 2025 22:35

automatically run ormolu

d316f3d

pchiusano approved these changes Oct 17, 2025

View reviewed changes

pchiusano reviewed Oct 17, 2025

View reviewed changes

dolio and others added 6 commits October 20, 2025 12:51

Fix runtime tests compilation

1be0154

automatically run ormolu

ff8dc6c

Update weeder file

98fb3b5

Pull some getters directly into the implementation

164def9

Allows some factoring out and improved definitions.

Transposed some names in the weeder file

9fc7c04

automatically run ormolu

64d41c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reimplement code/value deserialization based on a custom parser #5943

Reimplement code/value deserialization based on a custom parser #5943

Uh oh!

dolio commented Oct 17, 2025

Uh oh!

pchiusano left a comment

Uh oh!

pchiusano Oct 17, 2025

Uh oh!

dolio Oct 20, 2025

Uh oh!

aryairani commented Oct 18, 2025

Uh oh!

aryairani commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reimplement code/value deserialization based on a custom parser #5943

Are you sure you want to change the base?

Reimplement code/value deserialization based on a custom parser #5943

Uh oh!

Conversation

dolio commented Oct 17, 2025

Uh oh!

pchiusano left a comment

Choose a reason for hiding this comment

Uh oh!

pchiusano Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

dolio Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

aryairani commented Oct 18, 2025

Uh oh!

aryairani commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants