Implement sha256 #187

gsoltis · 2016-05-25T02:34:15Z

block_data_order algorithm ported from https://en.wikipedia.org/wiki/SHA-2

Let me know what you think! Aiming at #62

Is there a benchmark test already? Or does one need to be written?

briansmith · 2016-05-25T02:39:09Z

Thanks!

The benchmarks are in https://github.com/briansmith/crypto-bench. I suggest comparing your implementation to, in particular, the one in rust-crypto. IMO, if unless/until we have a clear plan for getting faster-than-rust-crypto (which is pretty fast) implementations, we should consider just using rust-crypto for the rare cases where we can't use the asm code.

Let me know the benchmark result. I'll find some time to look at the code soon.

coveralls · 2016-05-25T05:52:46Z

Coverage increased (+0.1%) to 81.007% when pulling 6f64368 on gsoltis:sha256 into 51e6a18 on briansmith:master.

gsoltis · 2016-05-25T15:40:54Z

It looks like rust-crypto is about 2x faster than my implementation. So are you thinking to just declare a dependency on that library (and I guess remove the existing sha1 implementation)? Or just leave the asm implementation (which, if I understand, is generated by the perl scripts from crypto/sha/asm) ?

briansmith · 2016-05-25T17:36:52Z

It looks like rust-crypto is about 2x faster than my implementation. So are you thinking to just declare a dependency on that library (and I guess remove the existing sha1 implementation)? Or just leave the asm implementation (which, if I understand, is generated by the perl scripts from crypto/sha/asm) ?

Our goal is to always use the best implementation for each target.

There are four ASM implementations of each digest function: x86, x86-64, 32-bit ARM, and AAarch64. Actually, there are more, because within each implementation there are build-time (e.g. x86 has a SSE2 built-time switch) and/or runtime switches (based on CPU feature detection).

It might be the case that in some cases, for some target, the rust-crypto implementation is faster than the ASM implementation for that target. In that case, we should just delete the ASM code for that target and use rust-crypto instead. That has to be determined on a target-by-target basis.

There are some targets where OpenSSL has asm implementations but we don't have the asm implementation in ring. The primary case is MIPS. If the OpenSSL MIPS asm code is faster than rust-crypto (on MIPS), then we should import in the MIPS asm code from OpenSSL.

There are some targets for which we don't have asm code at all, e.g. WebAssembly. For these, ring currently just doesn't built. These targets are the ones where having a fast Rust implementation is important (OTOH, how important are these targets, really?).

As far as how to integrate a fast Rust implementation goes, for the cases where we don't already have an ASM implementation faster than what we can do in Rust, we have various options:

We can add a new implementation in ring, like you proposed here.
We can copy/paste the block_data_order function (which probably has a different name) from another Rust implementation, respecting its license. (In practice, this would mean choosing an ISC- or MIT- or BSD- licensed library to share the code with.)
We can add a dependency on another crypto library (only for targets where we don't implement everything internally.)

The difficulty with option 3 is that usually crypto libraries don't expose the block_data_order type function that would be ideal for ring. Thus, option 1 seems like the best option to me. However, option 1 is a lot of work, because it is pretty hard to make a new implementation faster than every other Rust implementation. Thus, option 2 might be the most pragmatic choice.

What do you think?

briansmith · 2016-05-25T20:47:01Z

BTW, if you're interested in making your SHA-256 implementation 2x as fast, I'd be down to helping you with that too. Just know it will probably take some serious effort.

gsoltis · 2016-05-26T04:37:44Z

Option 2 does sound the most pragmatic.

I should mention, my own goals are to learn Rust, gain experience with native-code toolchains, and get more hands-on experience with the ssl protocol. So, I'm not super-confidant that I'll be able to get a 2x speedup out of my implementation, but I'll probably take a stab at doing at least some optimization, at least for the experience if nothing else.

If there are other more straightforward issues you think would be good to tackle, I'd be happy to spend time on those as well, especially if the solution here is to use something that exists already. Otherwise, as I said, I'll probably at least take a first pass at trying to optimize my implementation a little.

gsoltis · 2016-05-31T05:20:34Z

Quick update: I tried some limited manual unrolling in block_data_order_safe, from 64 iterations to 32, and that seemed to make it roughly 10% faster. So, this seems like a worthwhile avenue to pursue, and as you pointed out, seems to be what other implementations do.

gsoltis · 2016-06-02T15:03:06Z

Unrolled all 64 rounds, in chunks of 4 via macro. A little less readable now, but significantly faster. Still not quite as fast as rust-crypto. I'll probably look at unrolling the w calculation next.

coveralls · 2016-06-02T16:14:08Z

Changes Unknown when pulling 99ab5d3fc0e6c56397ff028fc3b8deb6c709760e on gsoltis:sha256 into * on briansmith:master*.

gsoltis · 2016-06-03T02:42:57Z

Looks like the build issues are a problem w/ nightly? stable is passing...

briansmith · 2016-06-03T21:59:14Z

Awesome!

If you have time, could you please split your change into two:

Replacing Wrapping with direct use of the wrapping_xxx() functions.
Your unrolling.

I'd like to keep each set of optimizations in a separate commit so we can measure the impact each optimization has. That will make it easier to guide future work (on other digest algorithms).

Now that your state w shouldn't need 64 entries. Almost all high-performance implementations use a 16-element w. This smaller state then makes register allocation, and sometimes auto-vectorization, easier.

I see that you defined your macro inside the function, but that isn't how macros work in Rust, at least as far as the namespacing goes. I recommend that you put the macro at the top level of the file.

gsoltis · 2016-06-05T23:44:56Z

Reworked the commits a little, and unrolled w, using 16 slots, as suggested. I haven't cleaned up the macro placement yet. But, the benchmark is a lot closer now! The exact numbers vary a bit, but on my ~6 year old macbook pro, I get rust-crypto being somewhere around 10-15% faster. Much better than 2x faster.

coveralls · 2016-06-06T01:47:54Z

Coverage increased (+0.2%) to 81.358% when pulling 2fcbba7 on gsoltis:sha256 into a25f86f on briansmith:master.

briansmith · 2016-06-06T06:06:26Z

Awesome. Being within 10-15% faster is quite reasonable. I'm pretty burned out today after working on a bunch of other stuff, but I'll look at this as soon as I get a chance.

samscott89 · 2016-06-06T09:32:29Z

I've got another optimisation which puts us within about 3% of the rust-crypto (on my machine at least). A simple trick to optimise the Sigma functions. I'll PR to this branch.

I get 213 MB/s -> 229 MB/s versus 235 MB/s for rust-crypto (on sha256::_1000).

Incidentally, if rust-crypto made the same change they would see a 20MB/s speedup...

Optimised Sigma functions.

gsoltis · 2016-06-13T01:41:38Z

Ok, I've merged @samscott89's PR, which does make a difference.

I looked at moving the macros out to the top level, but it appears that since they use local variables in the function, I need to modify them to take the identifiers for those variables. I'll add a commit to show you what I mean. If you have a preference on style, we can take that commit or not.

coveralls · 2016-06-13T02:54:51Z

Changes Unknown when pulling 3d9fba2 on gsoltis:sha256 into * on briansmith:master*.

gsoltis · 2016-06-13T03:14:51Z

re: last commit, note having to pass in $block, whereas before, it wasn't required. I don't have a strong opinion on the style either way, but if we go with moving the macros out, I'll have to rework the one remaining one.

coveralls · 2016-06-13T04:31:54Z

Changes Unknown when pulling e3ec283 on gsoltis:sha256 into * on briansmith:master*.

briansmith · 2016-07-03T01:10:58Z

Please see my latest comments in #199. Should I be reviewing the commits in this PR, or the ones in #199?

briansmith · 2016-11-17T22:22:36Z

I am actually kind of unsure about what happened with this PR in relation to #199 but it seems like we all decided to work in #199. Sorry about any confusion! I am as confused as anybody, I think.

Regardless, thank you very much for the PR!

Implement sha256. block_data_order algorithm ported from https://en.w…

6f64368

…ikipedia.org/wiki/SHA-2

samscott89 mentioned this pull request May 27, 2016

Implement SHA2 in Rust. #199

Closed

briansmith force-pushed the master branch from c376ec0 to 7b469a0 Compare June 2, 2016 11:31

gsoltis added 2 commits June 5, 2016 16:30

Remove Wrapping()

47d5045

Unroll main loop

68b6687

gsoltis force-pushed the sha256 branch from 99ab5d3 to 2fcbba7 Compare June 5, 2016 23:42

Unrolled w

2fcbba7

Optimised Sigma functions.

6e1ddc7

Merge pull request #1 from samscott89/gs_sha256

3d9fba2

Optimised Sigma functions.

Move some macros out. Note block -> $block

e3ec283

briansmith force-pushed the master branch from cf3284e to 5fd8301 Compare July 7, 2016 02:57

briansmith closed this Nov 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sha256 #187

Implement sha256 #187

gsoltis commented May 25, 2016

briansmith commented May 25, 2016

coveralls commented May 25, 2016 •

edited

Loading

gsoltis commented May 25, 2016

briansmith commented May 25, 2016 •

edited

Loading

briansmith commented May 25, 2016

gsoltis commented May 26, 2016

gsoltis commented May 31, 2016

gsoltis commented Jun 2, 2016

coveralls commented Jun 2, 2016 •

edited

Loading

gsoltis commented Jun 3, 2016

briansmith commented Jun 3, 2016 •

edited

Loading

gsoltis commented Jun 5, 2016

coveralls commented Jun 6, 2016

briansmith commented Jun 6, 2016

samscott89 commented Jun 6, 2016

gsoltis commented Jun 13, 2016

coveralls commented Jun 13, 2016 •

edited

Loading

gsoltis commented Jun 13, 2016

coveralls commented Jun 13, 2016 •

edited

Loading

briansmith commented Jul 3, 2016

briansmith commented Nov 17, 2016

Implement sha256 #187

Implement sha256 #187

Conversation

gsoltis commented May 25, 2016

briansmith commented May 25, 2016

coveralls commented May 25, 2016 • edited Loading

gsoltis commented May 25, 2016

briansmith commented May 25, 2016 • edited Loading

briansmith commented May 25, 2016

gsoltis commented May 26, 2016

gsoltis commented May 31, 2016

gsoltis commented Jun 2, 2016

coveralls commented Jun 2, 2016 • edited Loading

gsoltis commented Jun 3, 2016

briansmith commented Jun 3, 2016 • edited Loading

gsoltis commented Jun 5, 2016

coveralls commented Jun 6, 2016

briansmith commented Jun 6, 2016

samscott89 commented Jun 6, 2016

gsoltis commented Jun 13, 2016

coveralls commented Jun 13, 2016 • edited Loading

gsoltis commented Jun 13, 2016

coveralls commented Jun 13, 2016 • edited Loading

briansmith commented Jul 3, 2016

briansmith commented Nov 17, 2016

coveralls commented May 25, 2016 •

edited

Loading

briansmith commented May 25, 2016 •

edited

Loading

coveralls commented Jun 2, 2016 •

edited

Loading

briansmith commented Jun 3, 2016 •

edited

Loading

coveralls commented Jun 13, 2016 •

edited

Loading

coveralls commented Jun 13, 2016 •

edited

Loading