Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making hashing of values faster #1282

Open
ghost opened this issue Sep 17, 2018 · 12 comments
Open

Making hashing of values faster #1282

ghost opened this issue Sep 17, 2018 · 12 comments

Comments

@ghost
Copy link

ghost commented Sep 17, 2018

Looking at the compiler C code, it seems that it should be easy to provide a C function that does the same as fun x -> Digest.to_string (Marshal.to_string x []) but with using constant memory. We should look at adding such a function in the stdlib, that should make Dune a bit faster.

@rgrinberg
Copy link
Member

Note that this many not be enough for things like digesting maps properly. Since we'd like to digest things in the key order. Ideally, a new primitive would allow us to express that.

@ghost
Copy link
Author

ghost commented Sep 17, 2018

Currently the flow goes as follow:

ocaml value -> ocaml value in canonical form -> string -> digest

The proposed function would allow to get rid of the string step, but not the ocaml value in canonical form step. To get rid of this step as well, we would need an API to incrementally compute a digest. This API already exists in the C code but is not exposed to the user.

@nojb
Copy link
Collaborator

nojb commented Sep 17, 2018

I could spare a bit of time to work on this suggestion, if it helps. I am very very interested in anything that can make dune faster.

@ghost
Copy link
Author

ghost commented Sep 17, 2018

@nojb that would be great! To be clear, I don't know exactly how much we'd win, but since dune computes quite a lot of digests, I believe the win wouldn't be negligible. At least it would reduce GC pressure.

@nojb
Copy link
Collaborator

nojb commented Sep 17, 2018

Yes, I suspect the effect won't be dramatic, but it can't hurt. Also, it just seems a useful function to have.

I will give it a try when I can and report back here.

@jchavarri
Copy link
Collaborator

jchavarri commented Apr 1, 2021

fwiw I landed on this issue today, while I was profiling dune builds in a largish synthetic repo (50k modules) and the 2nd most time consuming function is caml_MD5transform:

$ perf report --no-children
+   14.38%  dune          dune                [.] mark_slice
+   13.88%  dune          dune                [.] caml_MD5Transform
+    7.58%  dune          dune                [.] caml_page_table_lookup
+    5.25%  dune          dune                [.] sweep_slice
+    4.00%  ocamldep.opt  [kernel.vmlinux]    [k] filemap_map_pages
+    3.93%  ocamldep.opt  [kernel.vmlinux]    [k] clear_page_rep
+    3.76%  ocamldep.opt  [kernel.vmlinux]    [k] copy_page
+    3.42%  ocamldep.opt  [kernel.vmlinux]    [k] unmap_page_range
+    2.31%  ocamldep.opt  ocamldep.opt        [.] fill_hashtable
+    1.77%  dune          [kernel.vmlinux]    [k] copy_user_generic_string
+    1.27%  ocamldep.opt  ld-2.28.so          [.] _dl_relocate_object
+    1.22%  ocamldep.opt  ocamldep.opt        [.] init_frame_descriptors
+    1.21%  ocamldep.opt  [kernel.vmlinux]    [k] asm_exc_page_fault
+    1.03%  ocamldep.opt  [kernel.vmlinux]    [k] perf_iterate_ctx
+    0.97%  dune          [kernel.vmlinux]    [k] clear_page_rep
+    0.97%  ocamldep.opt  ocamldep.opt        [.] next_frame_descr
+    0.95%  ocamldep.opt  [kernel.vmlinux]    [k] find_get_entry
+    0.79%  ocamldep.opt  [kernel.vmlinux]    [k] get_page_from_freelist
+    0.74%  dune          [kernel.vmlinux]    [k] __memset
+    0.72%  dune          [kernel.vmlinux]    [k] find_idlest_group
+    0.69%  ocamldep.opt  [kernel.vmlinux]    [k] __list_del_entry_valid
+    0.59%  dune          dune                [.] caml_oldify_one
+    0.58%  ocamldep.opt  [kernel.vmlinux]    [k] handle_mm_fault
+    0.55%  ocamldep.opt  [kernel.vmlinux]    [k] page_remove_rmap

If the gc calls are also related to strings being allocated and discarded quickly, then the impact of the changes that this issue propose might be quite large.

For the profiles, we're using this generator https://github.com/ahrefs/dune-bench (forked from ReScript bsb bench) as part of testing Melange. We can provide more info about how to reproduce these profilings if needed.

re: performance, more generically, we saw 5-6x performance boost in large codebases from the latest opam published version of Dune 2.8.5 vs latest commit e8ad49f, probably due to the recent vfork change 🎉 which is great news!

Thanks a lot for your work on making dune faster 🙏

@ghost
Copy link
Author

ghost commented Apr 1, 2021

Indeed, definitely worth a try. Thinking again of this idea, we could even squash the pipeline even more and go directly from ocaml value to digest, by folding the state through the value. Then we could feed the strings directly to the digestion algorithm without having to copy them.

We could also try something else that MD5. I remember that someone mentioning that there are better alternatives (faster and more secure).

For the profiles, we're using this generator https://github.com/ahrefs/dune-bench (forked from ReScript bsb bench) as part of testing Melange. We can provide more info about how to reproduce these profilings if needed.

Nice, thanks for the pointer. This comes right on time as we are just starting to setup continuous benchmarks for dune :)

re: performance, more generically, we saw 5-6x performance boost in large codebases from the latest opam published version of Dune 2.8.5 vs latest commit e8ad49f, probably due to the recent vfork change tada which is great news!

That's great!

@jchavarri
Copy link
Collaborator

I remember that someone mentioning that there are better alternatives (faster and more secure).

I made some search and found xxHash which is extremely fast. It's not cryptographic, but for dune use case it might be ok.

There are OCaml bindings in 314eter/ocaml-xxhash but these are old, there's also xapi-project/xen-api which are more modern, and using dune already. I guess if this was a potentially good idea, then dune would vendor the c library to avoid consumers having to deal with installing it right? The C library seems very lightweight.

@ghost
Copy link
Author

ghost commented Apr 6, 2021

Yeah, we would definitely vendor it. Thanks for the pointer, it seems worth looking into.

@nojb
Copy link
Collaborator

nojb commented Apr 6, 2021

Only tangentially related, but this weekend I did a little experiment by changing the implementation of Digest.file to use libc FILE* instead of OCaml's in_channel. I got:

             Current branch Main branch
             ============== ===========
           | real 6.145s    real 6.792s
Full build | user 41.905s   user 43.349s
           | sys  7.738s    sys  7.847s

           | real 0.483s    real 0.488s
Zero build | user 0.443s    user 0.447s
           | sys  0.040s    sys  0.039s

(~9% speedup for the full build). The test would have to be confirmed (there was a bit of noise), but maybe something to look into...

@ghost
Copy link
Author

ghost commented Apr 6, 2021

Indeed. Another thing to look into: for large files, it might be worth doing the hashing in a separate thread, occupying one job.

@nojb nojb closed this as completed Apr 6, 2021
@nojb nojb reopened this Apr 6, 2021
@ghost
Copy link
Author

ghost commented May 25, 2021

Update: we now have metrics for marshal+digest: #4649 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants