Description
It is a fairly well known & common problem that the efficiency of the generated in Rust is very low.
We've run into it multiple times where runtime performance becomes impossible to achieve with certain use cases and hot code, where one could get 1/200th of the performance compared to an optimized release build.
A typical such example is cryptographic hashing. Using the sha1
crate on a 10 MB file runs at 1 MB/s in debug compared to 200 MB/s in release, making it effectively unusable for that use case.
A common workaround is to build dev/debug config with O1
instead of O0
, and in this specific case it improves performance to around 1/4th of release performance, but at the cost of significantly increasing compile and iteration times instead which we don't want to do for our large project and repository at this time.
There seems to be two potential solutions here, one short term and one longer term.
Short term:
- Stabilize and use the new Profile Overrides cargo feature. This is available on nightly and I've tested this to work well for our worst use case so one can enable
O3
just on say thesha
crate.
[profile.dev.overrides.sha]
opt-level = 3
Longer term:
- Fast code iteration and decent debug performance using Cranelift debug backend for Rust. This is a specific goal for Cranelift but likely a lot of work