-
Notifications
You must be signed in to change notification settings - Fork 571
Binary encoding for externs #3841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Probably don't want this to end up in the PR, but it makes it easier to reproduce the speedups.
I have no problem with externs being binary personally, there's no real advantage to those files being human readable that I can think of. |
Yeah, I always thought we would want to switch to binary at some point too. |
I'm not terribly concerned about the choice of library; since the externs file format is internal we can always change it later if we find a different library is better suited. |
case decodeStrict externsFile of | ||
Nothing -> | ||
case Store.decode externsFile of | ||
Left _ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to try to come up with a way of parsing the compiler version from an externs file which is independent of the rest of the encoding, so that we can preserve this error message (externs version mismatch). Do you think we could give ExternsFile
a hand-written Store
instance which puts the compiler version at the very beginning of the file, and then write a parser for it which just extracts the version and ignores the rest of the file?
Related, I've created #3842 to track the bloat issues with externs. |
Might also want to look into |
New findings!
Thanks for that recommendation @joneshf. The The CBOR standard actually defines a bijection to JSON and https://hackage.haskell.org/package/cborg-json-0.2.2.0 implements it. So one thing we could do is continue using all our existing JSON instances (this also addresses @hdgarrood's concern about the version field) and put a little JSON <-> CBOR adapter in between reading and writing from the file system. Performance-wise this puts us somewhere in the middle between JSON and Store. It's 3 times faster than just using JSON, and two times slower than going for direct Store instances.
This approach still has all the problems outlined by @natefaubion though, we're encoding the same filepath over and over, and even worse when we're reading the file back in we get no sharing. Next I'll try to use https://hackage.haskell.org/package/serialise (typeclass based encoders/decoders on top of cborg) to get a fair comparison between |
Unfortunately the generated instances appear to be broken
Sad update :( I derived instances for |
New Update.... type-class based encoding + decoding failures as Exceptions tricked me (It tried to deserialise values of type
Since CBOR is a self-describing format I'll see if I can extract the version from it without trying to decode the whole ExternsFile. |
Allright, I think this is in a reviewable state now. In this PR I've changed the encoding for externs from JSON to CBOR. I'm using the By doing this we're getting significant speedups in IDE startup time, but also in the normal compiler pipeline. Because we're serialising to CBOR, it's possible to extract information like the version number that was stored with the externs, without having to be able to decode the whole format. Not using JSON anymore doesn't remove any features we used to have, because of this property. We will break downstream users that relied on the JSON format, but as far as I can tell we've agreed that externs are a compiler internal format for which we are not giving stability guarantees in any way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I'm very much looking forward to using this.
Thanks for the review @hdgarrood! I'll merge this and as mentioned earlier we can do more improvements in follow-up PRs. |
This commit changes the encoding for externs from JSON to CBOR. We're using the serialise library to derive instances for all the relevant data types. By doing this we're getting significant speedups in IDE startup time, but also in the normal compiler pipeline. The performance measurements in this PR measure how long it takes to load the fully built pscid project and populate the IDE caches. It's dominated by the time it takes to deserialise the externs files, and drops from 3.5s to 0.55s. Memory allocations drop from 8,460,187,856 bytes to 1,649,605,704 bytes.
This commit changes the encoding for externs from JSON to CBOR. We're using the serialise library to derive instances for all the relevant data types. By doing this we're getting significant speedups in IDE startup time, but also in the normal compiler pipeline. The performance measurements in this PR measure how long it takes to load the fully built pscid project and populate the IDE caches. It's dominated by the time it takes to deserialise the externs files, and drops from 3.5s to 0.55s. Memory allocations drop from 8,460,187,856 bytes to 1,649,605,704 bytes.
I did some profiling on the ide load times, and everything was dominated by JSON parsing. So I tried deriving
Store
(a binary serialize lib) for all the data types necessary for ExternsFile and then using the binary encoding I get significant speedups. Here are the stats forpscid
:Before:
After:
I'm just opening this as a draft for now. Do we want to make the switch to a binary encoding? Do we want to use a different library? I've found
binary
,cereal
andstore
so far and pickedstore
because it had both TH as well as Generic deriving. It's possible we could see some speedups in incremental builds as well, but any sizeable project I have generates too many warnings from the changes in master to give good timings right now.