-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making hashing of values faster #1282
Comments
Note that this many not be enough for things like digesting maps properly. Since we'd like to digest things in the key order. Ideally, a new primitive would allow us to express that. |
Currently the flow goes as follow:
The proposed function would allow to get rid of the |
I could spare a bit of time to work on this suggestion, if it helps. I am very very interested in anything that can make |
@nojb that would be great! To be clear, I don't know exactly how much we'd win, but since dune computes quite a lot of digests, I believe the win wouldn't be negligible. At least it would reduce GC pressure. |
Yes, I suspect the effect won't be dramatic, but it can't hurt. Also, it just seems a useful function to have. I will give it a try when I can and report back here. |
fwiw I landed on this issue today, while I was profiling dune builds in a largish synthetic repo (50k modules) and the 2nd most time consuming function is
If the gc calls are also related to strings being allocated and discarded quickly, then the impact of the changes that this issue propose might be quite large. For the profiles, we're using this generator https://github.com/ahrefs/dune-bench (forked from ReScript bsb bench) as part of testing Melange. We can provide more info about how to reproduce these profilings if needed. re: performance, more generically, we saw 5-6x performance boost in large codebases from the latest opam published version of Dune 2.8.5 vs latest commit e8ad49f, probably due to the recent vfork change 🎉 which is great news! Thanks a lot for your work on making dune faster 🙏 |
Indeed, definitely worth a try. Thinking again of this idea, we could even squash the pipeline even more and go directly from We could also try something else that MD5. I remember that someone mentioning that there are better alternatives (faster and more secure).
Nice, thanks for the pointer. This comes right on time as we are just starting to setup continuous benchmarks for dune :)
That's great! |
I made some search and found xxHash which is extremely fast. It's not cryptographic, but for dune use case it might be ok. There are OCaml bindings in 314eter/ocaml-xxhash but these are old, there's also xapi-project/xen-api which are more modern, and using dune already. I guess if this was a potentially good idea, then dune would vendor the c library to avoid consumers having to deal with installing it right? The C library seems very lightweight. |
Yeah, we would definitely vendor it. Thanks for the pointer, it seems worth looking into. |
Only tangentially related, but this weekend I did a little experiment by changing the implementation of
(~9% speedup for the full build). The test would have to be confirmed (there was a bit of noise), but maybe something to look into... |
Indeed. Another thing to look into: for large files, it might be worth doing the hashing in a separate thread, occupying one job. |
Update: we now have metrics for marshal+digest: #4649 (comment) |
Looking at the compiler C code, it seems that it should be easy to provide a C function that does the same as
fun x -> Digest.to_string (Marshal.to_string x [])
but with using constant memory. We should look at adding such a function in the stdlib, that should make Dune a bit faster.The text was updated successfully, but these errors were encountered: