-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sha256 #187
Implement sha256 #187
Conversation
Thanks! The benchmarks are in https://github.com/briansmith/crypto-bench. I suggest comparing your implementation to, in particular, the one in rust-crypto. IMO, if unless/until we have a clear plan for getting faster-than-rust-crypto (which is pretty fast) implementations, we should consider just using rust-crypto for the rare cases where we can't use the asm code. Let me know the benchmark result. I'll find some time to look at the code soon. |
It looks like rust-crypto is about 2x faster than my implementation. So are you thinking to just declare a dependency on that library (and I guess remove the existing sha1 implementation)? Or just leave the asm implementation (which, if I understand, is generated by the perl scripts from crypto/sha/asm) ? |
Our goal is to always use the best implementation for each target. There are four ASM implementations of each digest function: x86, x86-64, 32-bit ARM, and AAarch64. Actually, there are more, because within each implementation there are build-time (e.g. x86 has a SSE2 built-time switch) and/or runtime switches (based on CPU feature detection). It might be the case that in some cases, for some target, the rust-crypto implementation is faster than the ASM implementation for that target. In that case, we should just delete the ASM code for that target and use rust-crypto instead. That has to be determined on a target-by-target basis. There are some targets where OpenSSL has asm implementations but we don't have the asm implementation in ring. The primary case is MIPS. If the OpenSSL MIPS asm code is faster than rust-crypto (on MIPS), then we should import in the MIPS asm code from OpenSSL. There are some targets for which we don't have asm code at all, e.g. WebAssembly. For these, ring currently just doesn't built. These targets are the ones where having a fast Rust implementation is important (OTOH, how important are these targets, really?). As far as how to integrate a fast Rust implementation goes, for the cases where we don't already have an ASM implementation faster than what we can do in Rust, we have various options:
The difficulty with option 3 is that usually crypto libraries don't expose the What do you think? |
BTW, if you're interested in making your SHA-256 implementation 2x as fast, I'd be down to helping you with that too. Just know it will probably take some serious effort. |
Option 2 does sound the most pragmatic. I should mention, my own goals are to learn Rust, gain experience with native-code toolchains, and get more hands-on experience with the ssl protocol. So, I'm not super-confidant that I'll be able to get a 2x speedup out of my implementation, but I'll probably take a stab at doing at least some optimization, at least for the experience if nothing else. If there are other more straightforward issues you think would be good to tackle, I'd be happy to spend time on those as well, especially if the solution here is to use something that exists already. Otherwise, as I said, I'll probably at least take a first pass at trying to optimize my implementation a little. |
Quick update: I tried some limited manual unrolling in |
Unrolled all 64 rounds, in chunks of 4 via macro. A little less readable now, but significantly faster. Still not quite as fast as rust-crypto. I'll probably look at unrolling the |
Looks like the build issues are a problem w/ nightly? stable is passing... |
Awesome! If you have time, could you please split your change into two:
I'd like to keep each set of optimizations in a separate commit so we can measure the impact each optimization has. That will make it easier to guide future work (on other digest algorithms). Now that your state I see that you defined your macro inside the function, but that isn't how macros work in Rust, at least as far as the namespacing goes. I recommend that you put the macro at the top level of the file. |
Reworked the commits a little, and unrolled |
Awesome. Being within 10-15% faster is quite reasonable. I'm pretty burned out today after working on a bunch of other stuff, but I'll look at this as soon as I get a chance. |
I've got another optimisation which puts us within about 3% of the rust-crypto (on my machine at least). A simple trick to optimise the Sigma functions. I'll PR to this branch. I get 213 MB/s -> 229 MB/s versus 235 MB/s for rust-crypto (on sha256::_1000). Incidentally, if rust-crypto made the same change they would see a 20MB/s speedup... |
Optimised Sigma functions.
Ok, I've merged @samscott89's PR, which does make a difference. I looked at moving the macros out to the top level, but it appears that since they use local variables in the function, I need to modify them to take the identifiers for those variables. I'll add a commit to show you what I mean. If you have a preference on style, we can take that commit or not. |
Changes Unknown when pulling 3d9fba2 on gsoltis:sha256 into * on briansmith:master*. |
re: last commit, note having to pass in |
Changes Unknown when pulling e3ec283 on gsoltis:sha256 into * on briansmith:master*. |
block_data_order algorithm ported from https://en.wikipedia.org/wiki/SHA-2
Let me know what you think! Aiming at #62
Is there a benchmark test already? Or does one need to be written?