Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAKE3 should have a committing deterministic authenticated cipher #138

Open
riastradh opened this issue Dec 4, 2020 · 13 comments
Open

Comments

@riastradh
Copy link

riastradh commented Dec 4, 2020

The popular authenticated ciphers AES-GCM and ChaCha/Poly1305 are fast and work well in TLS, but they've seen a variety of issues in incautiously designed systems: forgery by nonce-disrespecting adversaries against AES-GCM in TLS prior to 1.3, evasion of message franking at Facebook, and password recovery in password-authenticated key exchanges like early versions of OPAQUE. Public nonces can also be privacy leaks. Another nonce-based authenticated cipher, AES-CTR with HMAC-SHA256 in encrypt-then-MAC, saw an incident of nonce reuse in the Tarsnap backup service, and engineers are often confused by the mandatory initialization vectors and nonces in the API of symmetric ciphers as evidenced by a cursory search of the Cryptography Stack Exchange.

Can we avoid these problems -- make an authenticated cipher that's easier to use, safer, and still fast?

  • A deterministic authenticated cipher like AES-SIV simultaneously simplifies the interface of an AEAD by requiring only a header and a payload -- no nonces needed. (I suggested Daence as a cheap way to get a DAE out of ChaCha or Salsa20 and Poly1305 as widely deployed black boxes, but it's not committing.)
    Of course, if you have a unique message number, you can incorporate it in the message header, as a public nonce, or in the message payload, as a secret nonce, to conceal when messages are repeated and quickly reject old forgery attempts -- and you're not restricted to a 64-bit or 96-bit or 192-bit nonce, so for, e.g., streaming a large file by authenticating and encrypting each chunk separately like STREAM, you can just incorporate the whole file name or hash of the file in the header or payload of the chunk.
  • In a committing authenticated cipher, the authenticated ciphertext serves as a commitment to the key and/or message -- a property often assumed by engineers, such as the designers of Facebook's original message franking system, but counterintuitively not provided by AES-GCM, ChaCha/Poly1305, or other common authenticated ciphers.

I propose BLAKE3 should provide a committing deterministic authenticated cipher.

  • BLAKE3 naturally functions as a MAC, since it can pseudorandomly compress a long message into a short output, with parallelism from tree hashing like KangarooTwelve -- and it functions as a PRF adequate for an SIV-type construction.
  • BLAKE3 naturally functions as a stream cipher, since it can pseudorandomly generate a key stream for each message number, with parallelism from independence of blocks like the ChaCha stream cipher.
  • BLAKE3 naturally functions as a commitment scheme, since it's collision-resistant.
  • Unlike many of the schemes in Shay Gueron's proposal, BLAKE3 can safely use the authentication tag as the commitment for a total of 32 bytes of ciphertext expansion, and doesn't require an initial KDF step to expand a short key into longer keys.

Here's a straw proposal built out of SIV -- it may not make optimal use of the input space but it illustrates the point. To encrypt message m with header a under key k,

  1. Compute the 32-byte authentication tag t := BLAKE3_k(pad(a) || pad(m) || len64(a) || len64(m) || 0).
  2. Compute the ciphertext c := m + BLAKE3_k(t || 1), where the BLAKE3 output is taken to be as long as the message, as a stream cipher.

Here pad(...) means padding with zeros up to a 64-byte block boundary, 0 and 1 represent individual 8-bit bytes of the respective numeric values for domain separation, and len64(...) means the little-endian 64-bit encoding of the number of 8-bit bytes in a string.

The tag t serves as a commitment to a triple (k, a, m) of key, associated data, and message, under the commitment verification equation t ?= BLAKE3_k(pad(a) || pad(m) || len64(a) || len64(m) || 0). Authenticated decryption is defined the obvious way, by undoing the stream cipher and verifying the commitment.

Informal security claims and proof sketches:

  • DAE security.
    The input encoding in step (1) is injective, and never coincides with the input in step (2) because of the domain separation, so it is safe to reuse the key for both purposes. Step (1) is a variable-length-input-to-fixed-output PRF, and step (2) is an IND-CPA randomized cipher just like XSalsa20, so the hypotheses of the SIV theorem are essentially satisfied. (The keys of the PRF and the cipher are not independent here, so the theorem doesn't strictly apply -- but that doesn't matter because the input spaces in steps (1) and (2) are disjoint, and as I showed in the Daence theorem, it's enough that the probability of a collision between the two spaces be small.)

  • Commitment security.

    • Sender-binding security: The adversary obviously cannot find a key k, associated data a, ciphertext c, and tag t that passes authenticated decryption but fails commitment verification, because the encryption/decryption in step (1) is a bijection of plaintext and ciphertext, and the decryption verification and commitment verification equations are identical.
    • Receiver-binding security: If the adversary can find a tag t and tuples (k, a, m) and (k', a', m') of key/associated data/message that pass verification such that (a, m) =/= (a', m'), then the equation
      BLAKE3_k(pad(a) || pad(m) || len64(a) || len64(m) || 0)
      = BLAKE3_k'(pad(a') || pad(m') || len64(a') || len64(m') || 0)
      
      must hold, implying a collision in BLAKE3 since the input encoding is injective. As long as BLAKE3 is collision-resistant, this can't happen.
    • Random-key robustness: If the adversary can, given distinct keys k and k', find an associated data a, tag t, and ciphertext c that pass authenticated decryption under both keys, then the equation
      BLAKE3_k(pad(a) || pad(m) || len64(a) || len64(m) || 0)
      = BLAKE3_k'(pad(a) || pad(m') || len64(a) || len64(m') || 0)
      
      must hold, again implying a collision in BLAKE3; here m and m' are the respective plaintexts under k and k' for c. (This is essentially a slightly weaker version of receiver-binding security: the adversary has no control over k and k'. The OPAQUE paper does not anticipate associated data; we could allow the adversary to pick a and a' and it would still boil down to finding a BLAKE3 collision.)
  • INT-RUP: Integrity under release of unverified plaintext. The inputs to BLAKE3 are always distinct, so although releasing unverified plaintext reveals the key stream, it is independent of the tag on any message, so the adversary learns nothing that helps with forgery except by breaking BLAKE3 itself.

Thoughts?

@riastradh
Copy link
Author

riastradh commented Dec 4, 2020

The input encoding is suboptimal for short messages -- for instance, a single-byte header and single-byte payload requires four BLAKE3 compression calls (one to process header, one to process payload, one to process lengths, and one to generate key stream). Could do something like BLAKE3_k(pad(a) || len64(a) || pad(m) || len64(m)) instead, where pad(...) pads to 8 bytes prior to the end of a block; this saves one compression call in the case of a single-byte header and a single-byte payload, but would cost more for, say, 57-byte headers and payloads.

I didn't put a lot of thought into these examples for illustration. Some criteria for the input encoding (in the language of RFC 2119 and RFC 6919):

  • MUST be injective
  • MUST have no collisions between any input to BLAKE3 in step (1) and any input to BLAKE3 in step (2)
  • SHOULD CONSIDER having header and payload block-aligned, for bulk throughput
  • MAY WISH TO use existing BLAKE3 operations as black boxes, without adding new internal flags, so existing BLAKE3 implementations can be reused without modification (but perhaps BLAKE3 is new enough that this doesn't matter and it would be better to use a new internal flag and avoid all padding than to worry about difficulty implementing it in terms of existing libraries)

@oconnor663
Copy link
Member

This is something I've been thinking a lot about recently too. Here are a couple of issues that I've been bouncing around:

1) Streaming decryption. Related to "ease of use". The design above requires the sender and the receiver to buffer the whole plaintext, before any of it can be encrypted (by the sender) or authenticated (by the receiver). Whether this is an issue depends on the use case, of course. But "encrypting a file" is a pretty common use case, and most APIs that deal with files expect to be able to deal with them incrementally. The typical way to solve this problem is to chunk up the input, encrypt and authenticate each chunk separately, and then address all the new attacks that that opens up (chunk reordering, truncation, etc). This is all very doable when your security is allowed to depend on the unique nonce, though I still think there's room for a general-purpose standard here. Tink's Streaming AEAD and libsodium's crypto_secretstream are a couple of existing designs.

However, chunked encryption causes problems when you're aiming for SIV properties. As you note in the "Can Daence do streaming or random access?" section of your paper, you need to chain the chunks together somehow if you want to prevent an attacker from mixing-and-matching chunks between messages, and that means you can't allow seeking. (Personal opinion: A couple of the most obvious use cases for incremental decryption are streaming audio and video, and that suggests pretty strongly that seeking is important for a general-purpose design.) The "chosen prefix, secret suffix" attack described in the CHAIN/STREAM paper is also a lot worse than the typical "attackers can see when two messages are identical" property that one expects from a SIV design without a unique nonce, and there's no way to prevent that attack while preserving incrementality.

I think there's a third option here, though it might be excessively complicated. The sender could include the entire BLAKE3 tree inline with the ciphertext (encrypted let's say). The sender would still need a couple of passes to encrypt, so encryption wouldn't be streaming. But the recipient could then traverse the tree to authenticate each chunk independently. That would enable streaming and seeking on the recipient side, and it would defeat the CPSS attack and recover the expected SIV property. In common use cases, streaming decryption on the recipient side is much more important than streaming encryption on the sender side, so this tradeoff might be acceptable. (Use cases that do require streaming on the sender side -- live video chat, for example -- don't really want to represent the video stream as a single ciphertext anyway. They want to treat each packet as a ciphertext and allow for packet loss.) And one counterpoint to the "excessively complicated" issue is that the code for this already exists: this is exactly what bao decode does, except without the stream cipher and using the unkeyed mode of BLAKE3.

2) Performance. BLAKE3 can perform well for very long and very short messages, in the same ballpark as hardware-accelerated AES-GCM (if you make AES pay the key setup cost for each short message, which might or might not be fair.) But performance for medium-length messages between 1-2 KiB, where chunk parallelism hasn't yet kicked in, is more in line with other hash functions, which is a few times slower than harware-accelerated AES. Unfortunately that's kind of an important regime. The max TLS record size is 16 KiB, but I think in practice a lot of TLS packets end up in that 1-2 KiB range?

Now maybe one could argue that AES is so fast that a few times slower in a limited regime isn't a big deal. After all, most applications should probably worry more about nonce reuse than they should about bumping their encryption throughput up from say 1 GB/s to 6 GB/s. But let me frame the question differently. Consider an alternative design, let's call it "XChaCha20-Poly1305 plus the nonce is random and generated for you by the encryption function so that there's no way you can possibly screw it up besides maybe fork()." Now that design is competitive with AES across the board, and it also solves most of the biggest nonce reuse issues. It's not committing of course, so you can still make the franking mistake, and it's also not going to protect you if /dev/urandom is broken somehow. But I think we could argue that that design ("if implemented correctly", famous last words) captures most of the value of a SIV mode for most applications, without actually being SIV. So I think the real question is, if that's the competition, would there still be much of a market for a design that might be 6x slower in some important cases?

@PaulGrandperrin
Copy link
Contributor

PaulGrandperrin commented Jan 9, 2021

I also recently became interested in key-committing AEADs (after reading about partitioning attack oracles) and SIV constructions.
I wrote a small toy project implementing some kind of "XChacha8Blake3Siv" very similar to what @riastradh is describing. I was also interested in testing out replacing xchacha by blake3 but gave up because of the lack of a OutputReader::xor() method which would allow XORing the keystream easily and efficiently.

But after finding about this issue, I got motivated again, wrote that xor method, and made my code more generic to do some benchmarks and put some numbers in perspective.

I am only a very newbie amateur in cryptography but I thought that maybe those numbers might interest you.

So here is my code: https://github.com/PaulGrandperrin/XChaCha8Blake3Siv/blob/master/src/lib.rs

impl<C: NewStreamCipher + StreamCipher, M: NewMac + Mac> AeadInPlace for AeadSiv<C, M> {
...
}

you can plug any combination of StreamCipher and Mac and get an AeadInPlace which should be key-committing and nonce-reuse misuse-resistant.

I also implemented the Mac, NewMac, StreamCipher and NewStreamCipher traits for BLAKE3.

I then wrote some benchmark code to test a few combinations at different plaintext size: 1, 32, 128, 1K, 8K, 64KB, 1MB:
https://github.com/PaulGrandperrin/XChaCha8Blake3Siv/blob/master/benches/bench.rs

the different AEAD tested:

XChaCha20Poly1305
ChaChaPoly1305<c2_chacha::Ietf>
AeadSiv<c2_chacha::Ietf, blake3::Hasher>
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>
AeadSiv<Blake3StreamCipher, blake3::Hasher>

and here are the numbers (encryption+decryption, with no additional data):

AeadSiv<Blake3StreamCipher, blake3::Hasher>/       1     1237.7±12.42ns   789.0 KB/sec
AeadSiv<Blake3StreamCipher, blake3::Hasher>/      32     1426.7±17.66ns    21.4 MB/sec
AeadSiv<Blake3StreamCipher, blake3::Hasher>/     128     1836.6±124.68ns   66.5 MB/sec
AeadSiv<Blake3StreamCipher, blake3::Hasher>/    1024         7.4±0.12µs   132.5 MB/sec
AeadSiv<Blake3StreamCipher, blake3::Hasher>/    8192        40.7±0.38µs   192.1 MB/sec
AeadSiv<Blake3StreamCipher, blake3::Hasher>/   65536       265.2±4.75µs   235.7 MB/sec
AeadSiv<Blake3StreamCipher, blake3::Hasher>/ 1048576         4.1±0.08ms   243.4 MB/sec
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>/       1    1071.1±21.59ns   911.8 KB/sec
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>/      32    1083.5±32.29ns    28.2 MB/sec
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>/     128    1581.1±21.63ns    77.2 MB/sec
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>/    1024        4.6±0.04µs   213.2 MB/sec
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>/    8192       19.8±0.47µs   394.9 MB/sec
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>/   65536       99.0±1.63µs   631.2 MB/sec
AeadSiv<c2_chacha::ChaCha12, blake3::Hasher>/ 1048576    1447.3±22.64µs   691.0 MB/sec
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/       1    1162.0±86.98ns   840.4 KB/sec
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/      32    1168.1±25.51ns    26.1 MB/sec
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/     128    1753.3±20.37ns    69.6 MB/sec
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/    1024        5.0±0.05µs   197.0 MB/sec
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/    8192       22.8±0.16µs   342.2 MB/sec
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/   65536      123.2±1.03µs   507.3 MB/sec
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/ 1048576    1834.1±27.71µs   545.2 MB/sec
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>/       1     1033.4±165.41ns  945.0 KB/sec
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>/      32     1034.7±13.67ns    29.5 MB/sec
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>/     128     1489.5±36.43ns    82.0 MB/sec
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>/    1024         4.4±0.34µs   221.6 MB/sec
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>/    8192        18.3±0.18µs   427.6 MB/sec
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>/   65536        86.4±0.77µs   723.6 MB/sec
AeadSiv<c2_chacha::ChaCha8, blake3::Hasher>/ 1048576     1256.1±28.38µs   796.1 MB/sec
AeadSiv<c2_chacha::Ietf, blake3::Hasher>/       1        1157.0±49.56ns   844.1 KB/sec
AeadSiv<c2_chacha::Ietf, blake3::Hasher>/      32        1168.5±21.18ns    26.1 MB/sec
AeadSiv<c2_chacha::Ietf, blake3::Hasher>/     128        1755.9±77.94ns    69.5 MB/sec
AeadSiv<c2_chacha::Ietf, blake3::Hasher>/    1024            5.0±0.07µs   195.7 MB/sec
AeadSiv<c2_chacha::Ietf, blake3::Hasher>/    8192           22.8±0.52µs   342.7 MB/sec
AeadSiv<c2_chacha::Ietf, blake3::Hasher>/   65536          123.1±2.40µs   507.9 MB/sec
AeadSiv<c2_chacha::Ietf, blake3::Hasher>/ 1048576        1831.2±29.49µs   546.1 MB/sec
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>/       1   1198.6±16.73ns   814.8 KB/sec
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>/      32   1215.9±15.48ns    25.1 MB/sec
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>/     128   1714.0±30.73ns    71.2 MB/sec
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>/    1024       4.7±0.09µs   207.3 MB/sec
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>/    8192      19.8±0.20µs   393.7 MB/sec
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>/   65536      99.0±0.82µs   631.2 MB/sec
AeadSiv<c2_chacha::XChaCha12, blake3::Hasher>/ 1048576   1441.3±14.90µs   693.8 MB/sec
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>/       1   1379.5±18.82ns   707.9 KB/sec
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>/      32   1390.6±15.71ns    21.9 MB/sec
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>/     128   1973.5±31.40ns    61.9 MB/sec
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>/    1024       5.2±0.07µs   187.6 MB/sec
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>/    8192      23.0±0.13µs   339.1 MB/sec
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>/   65536     123.4±1.11µs   506.6 MB/sec
AeadSiv<c2_chacha::XChaCha20, blake3::Hasher>/ 1048576   1835.0±17.03µs   545.0 MB/sec
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>/       1    1113.9±15.01ns   876.7 KB/sec
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>/      32    1125.3±16.47ns    27.1 MB/sec
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>/     128    1582.9±23.93ns    77.1 MB/sec
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>/    1024        4.5±0.04µs   218.2 MB/sec
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>/    8192       18.3±0.26µs   427.1 MB/sec
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>/   65536       86.6±0.84µs   722.0 MB/sec
AeadSiv<c2_chacha::XChaCha8, blake3::Hasher>/ 1048576    1247.9±18.37µs   801.3 MB/sec
ChaChaPoly1305<c2_chacha::Ietf>/       1                 1218.6±17.99ns   801.4 KB/sec
ChaChaPoly1305<c2_chacha::Ietf>/      32                 1278.0±17.53ns    23.9 MB/sec
ChaChaPoly1305<c2_chacha::Ietf>/     128                 1787.6±30.11ns    68.3 MB/sec
ChaChaPoly1305<c2_chacha::Ietf>/    1024                     4.0±0.06µs   242.4 MB/sec
ChaChaPoly1305<c2_chacha::Ietf>/    8192                    24.7±0.26µs   316.0 MB/sec
ChaChaPoly1305<c2_chacha::Ietf>/   65536                   191.1±1.14µs   327.1 MB/sec
ChaChaPoly1305<c2_chacha::Ietf>/ 1048576                     3.0±0.03ms   328.2 MB/sec
XChaCha20Poly1305/       1                               1556.2±28.19ns   627.5 KB/sec
XChaCha20Poly1305/      32                               1619.3±36.83ns    18.8 MB/sec
XChaCha20Poly1305/     128                               1891.3±433.18ns   64.5 MB/sec
XChaCha20Poly1305/    1024                                   5.0±0.05µs   193.8 MB/sec
XChaCha20Poly1305/    8192                                  30.5±0.72µs   255.8 MB/sec
XChaCha20Poly1305/   65536                                 234.7±2.64µs   266.3 MB/sec
XChaCha20Poly1305/ 1048576                                   3.7±0.03ms   268.0 MB/sec

The benchmark were made with criterion and ran on an N2D (AMD EPYC Rome) class GCP VM:

RUSTFLAGS="-Ctarget-cpu=native" cargo bench --bench bench -- --sample-size 1000 --measurement-time 5 --warm-up-time 1

more details about the results can be found here: https://paulg.fr/criterion/report/

My code has a few differences compared to what @riastradh describes:

  • I am unsure about what is the goal of the padding (performance, security, both?), so I didn't implement it. (I also wasn't sure about how using the Hasher::update() method efficiently without doing useless copies)
  • I only encoded the size of the AD in the source of the PRF, but not the size of the plaintext. I can't figure out why it is needed when BLAKE3 is resistant to length extension attacks and the encoding is already unambiguous with just one of the two lengths.
  • I didn't do key domain separation because I guess it wasn't useful when chacha was used with blake3, and for those benchmarks, I don't think it matters performance-wise and as @oconnor663 said, it could be added internally to BLAKE3.

Here is the main part of the code:

    fn encrypt_in_place_detached(
        &self,
        nonce: &GenericArray<u8, <C as NewStreamCipher>::NonceSize>,
        associated_data: &[u8],
        buffer: &mut [u8],
    ) -> Result<GenericArray<u8, Self::TagSize>, Error> {
        let mut hasher = <M as NewMac>::new(self.key.as_slice().try_into().unwrap());
        hasher.update(nonce);
        hasher.update(&(associated_data.len() as u64).to_le_bytes()); // little-endian
        hasher.update(associated_data);
        hasher.update(buffer);
        let tag: GenericArray<_,_> = hasher.finalize().into_bytes(); // consumes the Hash to avoid copying
        let siv= tag[0..Self::NonceSize::USIZE].into(); // constructs a reference to avoid copying
        <C as NewStreamCipher>::new(&self.key,siv).encrypt(buffer);
        Ok(tag)
    }

I would be extremely happy if this work can be of any use and also getting some feedback from you :)

And thanks for this awesome BLAKE3 piece of software!

@oconnor663
Copy link
Member

I am unsure about what is the goal of the padding (performance, security, both?), so I didn't implement it.

I don't think that's a particular problem. HMAC does pad things up to a full block, and I assume that was a conservative security decision to stop different inputs from mixing in a single block, but I don't know for sure. One practical benefit is that because you can run the compression function at each block boundary, padding guarantees that you can compress as soon as the first input is complete, without waiting for any of the second input to be avialable. That might mean you don't have to keep buffering sensitive input bytes, which could matter in some niche attack scenarios, like where people talk about "ratchets".

I only encoded the size of the AD in the source of the PRF, but not the size of the plaintext. I can't figure out why it is needed when BLAKE3 is resistant to length extension attacks and the encoding is already unambiguous with just one of the two lengths.

I think you're right here, and I'm not sure why most designs prefer to encode both lengths.

@riastradh
Copy link
Author

riastradh commented Jan 11, 2021

and here are the numbers (encryption+decryption, with no additional data):

AeadSiv<Blake3StreamCipher, blake3::Hasher>/ 1048576         4.1±0.08ms   243.4 MB/sec
...
AeadSiv<c2_chacha::ChaCha20, blake3::Hasher>/ 1048576    1834.1±27.71µs   545.2 MB/sec

The BLAKE3 stream cipher should be able to run much faster than ChaCha, because it is essentially the same computation with many fewer rounds (7 vs 20). I assume this must be naive Rust code without SIMD vectorization?

My code has a few differences compared to what @riastradh describes:

  • I am unsure about what is the goal of the padding (performance, security, both?), so I didn't implement it. (I also wasn't sure about how using the Hasher::update() method efficiently without doing useless copies)

No security benefit.

  • There's a minor potential performance benefit: keeps the inputs block-aligned.
  • There's a minor potential performance cost: may require hashing more input for some message lengths.

I haven't measured one way or another.

  • I only encoded the size of the AD in the source of the PRF, but not the size of the plaintext. I can't figure out why it is needed when BLAKE3 is resistant to length extension attacks and the encoding is already unambiguous with just one of the two lengths.

No particular reason; I just copied the way that the ChaCha/Poly1305 input was encoded so that it wouldn't raise any security questions that might obscure the broader point.

  • I didn't do key domain separation because I guess it wasn't useful when chacha was used with blake3, and for those benchmarks, I don't think it matters performance-wise and as @oconnor663 said, it could be added internally to BLAKE3.

That's reasonable. There may also be a reasonable way to do domain separation without adding a suffix -- I didn't think very hard about it, or about tweaking the padding or length encoding to squeeze out the last few cycles per message, because I wanted to focus on the crux of the proposal (a committing authenticated cipher built entirely out of the BLAKE3 primitive as an all-in-one) rather than minor details.

@oconnor663
Copy link
Member

The BLAKE3 stream cipher should be able to run much faster than ChaCha, because it is essentially the same computation with many fewer rounds (7 vs 20).

A BLAKE3 round is equivalent to a ChaCha "doubleround", so in that sense BLAKE3 is closest to a hypothetical ChaCha14.

@k0001
Copy link
Contributor

k0001 commented Jan 15, 2021

I'm glad to see this being discussed here! I have been thinking about this for a while, too.

What do you think of the construction at https://github.com/k0001/baile? I wrote the high level documentation in the README. I'm using the official BLAKE3 upstream distribution. I made no changes to the BLAKE3 sources.

Other than the lack of streaming support common to most SIV-like constructions, I think it addresses many of the issues brought up here. Notably, for messages of up to 64 bytes, it performs only two BLAKE3 compressions in total. This is useful, seeing how 64 bytes is such a frequent amount of small data to encrypt, for example, doing key exchange ceremonies.

Domain separation happens by deterministically tweaking the key a bit, as explained in the README and in baile.c. This allows keeping the message payload size and massaging at a minimum, and performance at a maximum. I think this is safe, but it's also totally possible that I'm missing something and this key tweaking is terribly wrong. Is it terribly wrong?

@riastradh
Copy link
Author

A BLAKE3 round is equivalent to a ChaCha "doubleround", so in that sense BLAKE3 is closest to a hypothetical ChaCha14.

Oops, so it is. Never can keep 'doublerounds' and 'rounds' straight even when I've implemented both of them myself!

Domain separation happens by deterministically tweaking the key a bit, as explained in the README and in baile.c. This allows keeping the message payload size and massaging at a minimum, and performance at a maximum. I think this is safe, but it's also totally possible that I'm missing something and this key tweaking is terribly wrong. Is it terribly wrong?

I expect you'll get into trouble with, e.g., a 2-byte AD fo and 4-byte text obar, and a 4-byte AD foob and 2-byte text ar, under the same key, because the xor of lengths will coincide (2 ^ 4 = 6) and the 'message' fed into BLAKE3 will coincide (foobar). That's why I started with the simplest dumbest padding scheme following existing designs with what is guaranteed to be enough domain separation.

@k0001
Copy link
Contributor

k0001 commented Jan 16, 2021

I expect you'll get into trouble with, e.g., a 2-byte AD fo and 4-byte text obar, and a 4-byte AD foob and 2-byte text ar, under the same key, because the xor of lengths will coincide (2 ^ 4 = 6) and the 'message' fed into BLAKE3 will coincide (foobar).

I don't think this particular scenario is a problem, because the Ad and Text lengths are XORed into separate parts of the key. Maybe I didn't write this very clearly in the pseudocode, but this comment in the implementation should clarify how this issue is avoided: https://github.com/k0001/baile/blob/0efb5bb8c2e535775ee93e8b3d70000d832f1a25/baile/baile.c#L123-L135

@riastradh
Copy link
Author

I don't think this particular scenario is a problem, because the Ad and Text lengths are XORed into separate parts of the key. Maybe I didn't write this very clearly in the pseudocode, but this comment in the implementation should clarify how this issue is avoided: https://github.com/k0001/baile/blob/0efb5bb8c2e535775ee93e8b3d70000d832f1a25/baile/baile.c#L123-L135

Got it, that sounds more reasonable. That said, this requires resistance to related-key attacks, and I don't know whether anyone conjectures that of BLAKE3 -- there's nothing in the BLAKE3 paper advertising it.

@k0001
Copy link
Contributor

k0001 commented Jan 21, 2021

Got it, that sounds more reasonable.

Great. Thanks for checking it out.

That said, this requires resistance to related-key attacks, and I don't know whether anyone conjectures that of BLAKE3 -- there's nothing in the BLAKE3 paper advertising it.

How can this be verified?

To clarify why I think reusing the key itself for domain separation would be useful, if it turns out that it's as safe as doing so in the payload message (I'm not saying it is): It makes it very cheap to encrypt or decrypt payloads of up to 64 bytes.

@sneves
Copy link
Collaborator

sneves commented Feb 5, 2021

A chosen-plaintext related-key distinguisher would be a valid attack against the compression function. It would certainly be of interest, although it's a more attacker-friendly model than the regular hash (cf. Table 5 of the spec to see how the various internal components can be distinguished)

@zookozcash
Copy link
Collaborator

Hey errybody in this thread! Please review Bessie: https://github.com/oconnor663/bessie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants