-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLAKE3 should have a committing deterministic authenticated cipher #138
Comments
The input encoding is suboptimal for short messages -- for instance, a single-byte header and single-byte payload requires four BLAKE3 compression calls (one to process header, one to process payload, one to process lengths, and one to generate key stream). Could do something like I didn't put a lot of thought into these examples for illustration. Some criteria for the input encoding (in the language of RFC 2119 and RFC 6919):
|
This is something I've been thinking a lot about recently too. Here are a couple of issues that I've been bouncing around: 1) Streaming decryption. Related to "ease of use". The design above requires the sender and the receiver to buffer the whole plaintext, before any of it can be encrypted (by the sender) or authenticated (by the receiver). Whether this is an issue depends on the use case, of course. But "encrypting a file" is a pretty common use case, and most APIs that deal with files expect to be able to deal with them incrementally. The typical way to solve this problem is to chunk up the input, encrypt and authenticate each chunk separately, and then address all the new attacks that that opens up (chunk reordering, truncation, etc). This is all very doable when your security is allowed to depend on the unique nonce, though I still think there's room for a general-purpose standard here. Tink's Streaming AEAD and libsodium's crypto_secretstream are a couple of existing designs. However, chunked encryption causes problems when you're aiming for SIV properties. As you note in the "Can Daence do streaming or random access?" section of your paper, you need to chain the chunks together somehow if you want to prevent an attacker from mixing-and-matching chunks between messages, and that means you can't allow seeking. (Personal opinion: A couple of the most obvious use cases for incremental decryption are streaming audio and video, and that suggests pretty strongly that seeking is important for a general-purpose design.) The "chosen prefix, secret suffix" attack described in the CHAIN/STREAM paper is also a lot worse than the typical "attackers can see when two messages are identical" property that one expects from a SIV design without a unique nonce, and there's no way to prevent that attack while preserving incrementality. I think there's a third option here, though it might be excessively complicated. The sender could include the entire BLAKE3 tree inline with the ciphertext (encrypted let's say). The sender would still need a couple of passes to encrypt, so encryption wouldn't be streaming. But the recipient could then traverse the tree to authenticate each chunk independently. That would enable streaming and seeking on the recipient side, and it would defeat the CPSS attack and recover the expected SIV property. In common use cases, streaming decryption on the recipient side is much more important than streaming encryption on the sender side, so this tradeoff might be acceptable. (Use cases that do require streaming on the sender side -- live video chat, for example -- don't really want to represent the video stream as a single ciphertext anyway. They want to treat each packet as a ciphertext and allow for packet loss.) And one counterpoint to the "excessively complicated" issue is that the code for this already exists: this is exactly what 2) Performance. BLAKE3 can perform well for very long and very short messages, in the same ballpark as hardware-accelerated AES-GCM (if you make AES pay the key setup cost for each short message, which might or might not be fair.) But performance for medium-length messages between 1-2 KiB, where chunk parallelism hasn't yet kicked in, is more in line with other hash functions, which is a few times slower than harware-accelerated AES. Unfortunately that's kind of an important regime. The max TLS record size is 16 KiB, but I think in practice a lot of TLS packets end up in that 1-2 KiB range? Now maybe one could argue that AES is so fast that a few times slower in a limited regime isn't a big deal. After all, most applications should probably worry more about nonce reuse than they should about bumping their encryption throughput up from say 1 GB/s to 6 GB/s. But let me frame the question differently. Consider an alternative design, let's call it "XChaCha20-Poly1305 plus the nonce is random and generated for you by the encryption function so that there's no way you can possibly screw it up besides maybe |
I also recently became interested in key-committing AEADs (after reading about partitioning attack oracles) and SIV constructions. But after finding about this issue, I got motivated again, wrote that I am only a very newbie amateur in cryptography but I thought that maybe those numbers might interest you. So here is my code: https://github.com/PaulGrandperrin/XChaCha8Blake3Siv/blob/master/src/lib.rs impl<C: NewStreamCipher + StreamCipher, M: NewMac + Mac> AeadInPlace for AeadSiv<C, M> {
...
} you can plug any combination of I also implemented the I then wrote some benchmark code to test a few combinations at different plaintext size: 1, 32, 128, 1K, 8K, 64KB, 1MB: the different AEAD tested: XChaCha20Poly1305
ChaChaPoly1305<c2_chacha::Ietf>
AeadSiv<c2_chacha::Ietf, blake3::Hasher>
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>
AeadSiv<Blake3StreamCipher, blake3::Hasher> and here are the numbers (encryption+decryption, with no additional data):
The benchmark were made with RUSTFLAGS="-Ctarget-cpu=native" cargo bench --bench bench -- --sample-size 1000 --measurement-time 5 --warm-up-time 1 more details about the results can be found here: https://paulg.fr/criterion/report/ My code has a few differences compared to what @riastradh describes:
Here is the main part of the code: fn encrypt_in_place_detached(
&self,
nonce: &GenericArray<u8, <C as NewStreamCipher>::NonceSize>,
associated_data: &[u8],
buffer: &mut [u8],
) -> Result<GenericArray<u8, Self::TagSize>, Error> {
let mut hasher = <M as NewMac>::new(self.key.as_slice().try_into().unwrap());
hasher.update(nonce);
hasher.update(&(associated_data.len() as u64).to_le_bytes()); // little-endian
hasher.update(associated_data);
hasher.update(buffer);
let tag: GenericArray<_,_> = hasher.finalize().into_bytes(); // consumes the Hash to avoid copying
let siv= tag[0..Self::NonceSize::USIZE].into(); // constructs a reference to avoid copying
<C as NewStreamCipher>::new(&self.key,siv).encrypt(buffer);
Ok(tag)
} I would be extremely happy if this work can be of any use and also getting some feedback from you :) And thanks for this awesome BLAKE3 piece of software! |
I don't think that's a particular problem. HMAC does pad things up to a full block, and I assume that was a conservative security decision to stop different inputs from mixing in a single block, but I don't know for sure. One practical benefit is that because you can run the compression function at each block boundary, padding guarantees that you can compress as soon as the first input is complete, without waiting for any of the second input to be avialable. That might mean you don't have to keep buffering sensitive input bytes, which could matter in some niche attack scenarios, like where people talk about "ratchets".
I think you're right here, and I'm not sure why most designs prefer to encode both lengths. |
The BLAKE3 stream cipher should be able to run much faster than ChaCha, because it is essentially the same computation with many fewer rounds (7 vs 20). I assume this must be naive Rust code without SIMD vectorization?
No security benefit.
I haven't measured one way or another.
No particular reason; I just copied the way that the ChaCha/Poly1305 input was encoded so that it wouldn't raise any security questions that might obscure the broader point.
That's reasonable. There may also be a reasonable way to do domain separation without adding a suffix -- I didn't think very hard about it, or about tweaking the padding or length encoding to squeeze out the last few cycles per message, because I wanted to focus on the crux of the proposal (a committing authenticated cipher built entirely out of the BLAKE3 primitive as an all-in-one) rather than minor details. |
A BLAKE3 round is equivalent to a ChaCha "doubleround", so in that sense BLAKE3 is closest to a hypothetical ChaCha14. |
I'm glad to see this being discussed here! I have been thinking about this for a while, too. What do you think of the construction at https://github.com/k0001/baile? I wrote the high level documentation in the Other than the lack of streaming support common to most SIV-like constructions, I think it addresses many of the issues brought up here. Notably, for messages of up to 64 bytes, it performs only two BLAKE3 compressions in total. This is useful, seeing how 64 bytes is such a frequent amount of small data to encrypt, for example, doing key exchange ceremonies. Domain separation happens by deterministically tweaking the key a bit, as explained in the |
Oops, so it is. Never can keep 'doublerounds' and 'rounds' straight even when I've implemented both of them myself!
I expect you'll get into trouble with, e.g., a 2-byte AD |
I don't think this particular scenario is a problem, because the Ad and Text lengths are XORed into separate parts of the key. Maybe I didn't write this very clearly in the pseudocode, but this comment in the implementation should clarify how this issue is avoided: https://github.com/k0001/baile/blob/0efb5bb8c2e535775ee93e8b3d70000d832f1a25/baile/baile.c#L123-L135 |
Got it, that sounds more reasonable. That said, this requires resistance to related-key attacks, and I don't know whether anyone conjectures that of BLAKE3 -- there's nothing in the BLAKE3 paper advertising it. |
Great. Thanks for checking it out.
How can this be verified? To clarify why I think reusing the key itself for domain separation would be useful, if it turns out that it's as safe as doing so in the payload message (I'm not saying it is): It makes it very cheap to encrypt or decrypt payloads of up to 64 bytes. |
A chosen-plaintext related-key distinguisher would be a valid attack against the compression function. It would certainly be of interest, although it's a more attacker-friendly model than the regular hash (cf. Table 5 of the spec to see how the various internal components can be distinguished) |
Hey errybody in this thread! Please review Bessie: https://github.com/oconnor663/bessie |
The popular authenticated ciphers AES-GCM and ChaCha/Poly1305 are fast and work well in TLS, but they've seen a variety of issues in incautiously designed systems: forgery by nonce-disrespecting adversaries against AES-GCM in TLS prior to 1.3, evasion of message franking at Facebook, and password recovery in password-authenticated key exchanges like early versions of OPAQUE. Public nonces can also be privacy leaks. Another nonce-based authenticated cipher, AES-CTR with HMAC-SHA256 in encrypt-then-MAC, saw an incident of nonce reuse in the Tarsnap backup service, and engineers are often confused by the mandatory initialization vectors and nonces in the API of symmetric ciphers as evidenced by a cursory search of the Cryptography Stack Exchange.
Can we avoid these problems -- make an authenticated cipher that's easier to use, safer, and still fast?
Of course, if you have a unique message number, you can incorporate it in the message header, as a public nonce, or in the message payload, as a secret nonce, to conceal when messages are repeated and quickly reject old forgery attempts -- and you're not restricted to a 64-bit or 96-bit or 192-bit nonce, so for, e.g., streaming a large file by authenticating and encrypting each chunk separately like STREAM, you can just incorporate the whole file name or hash of the file in the header or payload of the chunk.
I propose BLAKE3 should provide a committing deterministic authenticated cipher.
Here's a straw proposal built out of SIV -- it may not make optimal use of the input space but it illustrates the point. To encrypt message m with header a under key k,
t := BLAKE3_k(pad(a) || pad(m) || len64(a) || len64(m) || 0)
.c := m + BLAKE3_k(t || 1)
, where the BLAKE3 output is taken to be as long as the message, as a stream cipher.Here
pad(...)
means padding with zeros up to a 64-byte block boundary,0
and1
represent individual 8-bit bytes of the respective numeric values for domain separation, andlen64(...)
means the little-endian 64-bit encoding of the number of 8-bit bytes in a string.The tag t serves as a commitment to a triple (k, a, m) of key, associated data, and message, under the commitment verification equation
t ?= BLAKE3_k(pad(a) || pad(m) || len64(a) || len64(m) || 0)
. Authenticated decryption is defined the obvious way, by undoing the stream cipher and verifying the commitment.Informal security claims and proof sketches:
DAE security.
The input encoding in step (1) is injective, and never coincides with the input in step (2) because of the domain separation, so it is safe to reuse the key for both purposes. Step (1) is a variable-length-input-to-fixed-output PRF, and step (2) is an IND-CPA randomized cipher just like XSalsa20, so the hypotheses of the SIV theorem are essentially satisfied. (The keys of the PRF and the cipher are not independent here, so the theorem doesn't strictly apply -- but that doesn't matter because the input spaces in steps (1) and (2) are disjoint, and as I showed in the Daence theorem, it's enough that the probability of a collision between the two spaces be small.)
Commitment security.
INT-RUP: Integrity under release of unverified plaintext. The inputs to BLAKE3 are always distinct, so although releasing unverified plaintext reveals the key stream, it is independent of the tag on any message, so the adversary learns nothing that helps with forgery except by breaking BLAKE3 itself.
Thoughts?
The text was updated successfully, but these errors were encountered: