-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aes: zeroize
not fully removing key schedule from memory?
#385
Comments
The reproducer: aes-secret-leak.zip |
Stack bleaching is outside of area of responsibility for algorithm implementation crates. Current version of Rust does not provide any good tools for calculating max stack usage for a given function, so we can not reliably do it on our level. |
You can simply ask println!("Size of Aes256: {}", core::mem::size_of::<aes::Aes256>()); ...prints "Size of Aes256: 960" on an x86_64 target. Note that especially on x86, we optimize for performance over memory footprint. Each AES instance needs to store an entire key schedule with a 128-bit key for each round, expanded separately for both encryption and decryption. So yes, it's large, but that's how AES is commonly implemented on those targets in order to take advantage of hardware acceleration.
If you enable the |
Have you even looked at the reproducer? I have zeroize enabled:
And in Sequoia, we heap-allocate the cipher context, and I'll update the reproducer to do the same. What we're seeing here is not covered by zeroize. Updated reproducer: aes-secret-leak.zip We test Nettle, OpenSSL, and Botan. None of those leave traces of the key laying around in either heap nor stack. |
I’ll take a look when I have some time. Just to note the reproducer is far from minimal and includes a lot of superfluous, unrelated code which does not make it easy. |
zeroize
not fully removing key schedule from memory?
Also, can you give complete information about the target you were testing on, e.g. the target triple? The |
I have cut down the reproducer to only include the misbehaving code: fn main() {
use aes::cipher::KeyInit;
let cipher = Box::new(aes::Aes256::new_from_slice(&NEEDLE).unwrap());
drop(cipher);
}
const NEEDLE: &[u8] = b"@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"; My triple is |
@teythoon |
I'll also note we have fairly comprehensive tests of the https://github.com/RustCrypto/block-ciphers/blob/8d03900/aes/src/lib.rs#L161-L233 |
I tried running the example on a Debian 12 Docker container on an ARM64 Mac:
|
Maybe there is a way to work around this limitation in the API by providing a way to (re)key after the object has been moved to the heap. |
It would probably make more sense to have an API to initialize a boxed instance of a cipher, which initializes it on the heap and then does the key schedule setup when the value is already living on the heap |
It's quite hard to guarantee that Rust compiler will not place any sensitive data on stack and solutions probably will be quite fragile. I believe you should properly implement and use stack bleaching. It also will help with potential sensitive temporaries leaked to stack by compiler during encryption/decryption, especially when you will use higher-level algorithms like AEAD. |
I reproduced the issue on x86. There do seem to be several copies of the key left on the stack. The issue persists even if the code looks like this, so it's unrelated to boxing: fn stuff() {
use aes::cipher::KeyInit;
aes::Aes256::new_from_slice(NEEDLE).unwrap();
}
fn main() -> Result<()> {
stuff();
scan("aes encrypt").unwrap();
Ok(())
} If I run with...
...the problem does NOT occur. So it seems to possibly be something with the AES-NI backend, or failing that something with autodetection. (I now realize I didn't realize the All that said, I agree with @newpavlov that clearing all sensitive data from the stack is out-of-scope for this crate. It's something best implemented, or the strategy best decided, by the toplevel binary, not a library crate. Unless you see a bug in the My disposition is to close this issue again, although if there's a trivial fix someone can spot, we can potentially implement it. |
We are also a library crate, and we are not going to shift the responsibility to our downstream users. I do some stack clearing where my measurements have indicated leaks, but that is not trivial either. First, you need to figure out where clearing is necessary, then it is not obvious how much you need to clear. I think that the clearing should happen close to where the leaks happen, hence I turned to you. Do you know of crates that do clear parts of their stack to look at for inspiration? |
Proper stack bleaching requires to use an If you library is high-level enough, then it should be feasible for you to add stack bleaching to your functions. I would apply it for every function which works with sensitive data.
I work with such code, but, unfortunately, can not share it and it's not a library, but an application. I think @tarcieri mentioned a while ago some Rust libraries which use stack bleaching, but I don't remember the names. @tarcieri |
The general strategy I'd recommend is to use something like libfringe to allocate a new stack to run sensitive operations on, then upon completion zeroizing the space you allocated for that stack (though I haven't tried to implement this, and there may be newer alternatives to libfringe). The only out-of-the-box solution I'm aware of is in the
That'd be cool |
I'm pretty sure that is not what's happening. Starting with -O1 rustc will optimize that copy away, constructing the object directly on the heap: FWIW, I managed to get my stack bleaching workaround to work by bluntly zeroing more memory. Previously, I stopped my experiments at 4k which wasn't sufficient, and made me think there was a problem with my code or approach. Now I zero 8k, which gets the job done. So I guess the secret is leaked in a deep call stack. I feel like this is one more reason to want the clearing to be done closer to where it is leaked, or preferably the buggy code identified and the leak plugged. |
I confirmed it was unrelated to The issue does NOT occur when On x86, it occurs with RUSTFLAGS of On ARMv8, it occurs with Using those flags should bypass CPU feature autodetection. So it would seem to be in the hardware backends. It's possible there's something wrong with the way SIMD registers are being zeroized, or there's just unexpected stack spilling in both backends. Regardless, "fixing" this problem seems like it needs to happen on a backend-by-backend basis. |
Given that the But I can give it a try. |
Just tried it with For posterity, here's what I'm getting: x86_64 + AES-NI:
|
The ARMv8 backend appears to leave 14 copies on the stack, which would correspond to the number of rounds in AES-256. |
Shouldn't it be |
@newpavlov as far as I can tell the two syntaxes are identical. Regardless, there is no effect on the results regardless of if I use |
I tried a bit of an experiment on this branch trying to improve the ARMv8 backend: https://github.com/RustCrypto/block-ciphers/tree/aes/armv8-keep-key-schedule-off-stack Commit 99de856 made things slightly better (13 copies rather than 14, and one is only partial):
Commit b5b6339 tried to sprinkle some
That's 20 copies rather than the original 14, and way up from the original 13 just from adding some |
No, they are not. As we can see here without |
Well, again, regardless, there is no effect on the output so it's a red herring. x86_64 + AES-NI:
|
I think I've found why the issue happens on backends dependent on target feature. I started with the following simple code: #[no_mangle]
pub unsafe fn aes_new(key: &[u8; 16]) -> aes::Aes128Enc {
use aes::cipher::KeyInit;
aes::Aes128Enc::new(key.into())
} I compiled it as a shared library (with 0000000000001100 <aes_new>:
1100: 53 push rbx
1101: 48 89 fb mov rbx,rdi
1104: ff 15 ee 2e 00 00 call QWORD PTR [rip+0x2eee] # 3ff8 <_GLOBAL_OFFSET_TABLE_+0x38>
110a: 48 89 d8 mov rax,rbx
110d: 5b pop rbx
110e: c3 ret
110f: 90 nop
0000000000001110 <_ZN61_$LT$aes..ni..Aes128Enc$u20$as$u20$crypto_common..KeyInit$GT$3new17h9aa25a0d33d4862bE>:
1110: 48 89 f8 mov rax,rdi
1113: f3 0f 6f 06 movdqu xmm0,XMMWORD PTR [rsi]
1117: 66 0f 3a df c8 01 aeskeygenassist xmm1,xmm0,0x1
// key expansion impl We can see that It does not cause stack spillage in this particular case, but by replacing Click to expand0000000000001100 <aes_new>:
1100: 41 56 push r14
1102: 53 push rbx
1103: 48 81 ec e8 01 00 00 sub rsp,0x1e8
110a: 48 89 fb mov rbx,rdi
110d: 4c 8d b4 24 f0 00 00 lea r14,[rsp+0xf0]
1114: 00
1115: 4c 89 f7 mov rdi,r14
1118: ff 15 d2 2e 00 00 call QWORD PTR [rip+0x2ed2] # 3ff0 <_GLOBAL_OFFSET_TABLE_+0x38>
111e: 66 0f 38 db 84 24 00 aesimc xmm0,XMMWORD PTR [rsp+0x100]
1125: 01 00 00
1128: 66 0f 7f 84 24 e0 00 movdqa XMMWORD PTR [rsp+0xe0],xmm0
112f: 00 00
1131: 66 0f 38 db 84 24 10 aesimc xmm0,XMMWORD PTR [rsp+0x110]
1138: 01 00 00
113b: 66 0f 7f 84 24 d0 00 movdqa XMMWORD PTR [rsp+0xd0],xmm0
1142: 00 00
1144: 66 0f 38 db 84 24 20 aesimc xmm0,XMMWORD PTR [rsp+0x120]
114b: 01 00 00
114e: 66 0f 7f 84 24 c0 00 movdqa XMMWORD PTR [rsp+0xc0],xmm0
1155: 00 00
1157: 66 0f 38 db 84 24 30 aesimc xmm0,XMMWORD PTR [rsp+0x130]
115e: 01 00 00
1161: 66 0f 7f 84 24 b0 00 movdqa XMMWORD PTR [rsp+0xb0],xmm0
1168: 00 00
116a: 66 0f 38 db 84 24 40 aesimc xmm0,XMMWORD PTR [rsp+0x140]
1171: 01 00 00
1174: 66 0f 7f 84 24 a0 00 movdqa XMMWORD PTR [rsp+0xa0],xmm0
117b: 00 00
117d: 66 0f 38 db 84 24 50 aesimc xmm0,XMMWORD PTR [rsp+0x150]
1184: 01 00 00
1187: 66 0f 7f 84 24 90 00 movdqa XMMWORD PTR [rsp+0x90],xmm0
118e: 00 00
1190: 66 0f 38 db 84 24 60 aesimc xmm0,XMMWORD PTR [rsp+0x160]
1197: 01 00 00
119a: 66 0f 7f 84 24 80 00 movdqa XMMWORD PTR [rsp+0x80],xmm0
11a1: 00 00
11a3: 66 0f 38 db 84 24 70 aesimc xmm0,XMMWORD PTR [rsp+0x170]
11aa: 01 00 00
11ad: 66 0f 7f 44 24 70 movdqa XMMWORD PTR [rsp+0x70],xmm0
11b3: 66 0f 38 db 84 24 80 aesimc xmm0,XMMWORD PTR [rsp+0x180]
11ba: 01 00 00
11bd: 66 0f 7f 44 24 60 movdqa XMMWORD PTR [rsp+0x60],xmm0
11c3: 66 0f 38 db 84 24 90 aesimc xmm0,XMMWORD PTR [rsp+0x190]
11ca: 01 00 00
11cd: 66 0f 7f 44 24 50 movdqa XMMWORD PTR [rsp+0x50],xmm0
11d3: 66 0f 38 db 84 24 a0 aesimc xmm0,XMMWORD PTR [rsp+0x1a0]
11da: 01 00 00
11dd: 66 0f 7f 44 24 40 movdqa XMMWORD PTR [rsp+0x40],xmm0
11e3: 66 0f 38 db 84 24 b0 aesimc xmm0,XMMWORD PTR [rsp+0x1b0]
11ea: 01 00 00
11ed: 66 0f 7f 44 24 30 movdqa XMMWORD PTR [rsp+0x30],xmm0
11f3: 66 0f 38 db 84 24 c0 aesimc xmm0,XMMWORD PTR [rsp+0x1c0]
11fa: 01 00 00
11fd: 66 0f 7f 44 24 20 movdqa XMMWORD PTR [rsp+0x20],xmm0
1203: 0f 28 84 24 f0 00 00 movaps xmm0,XMMWORD PTR [rsp+0xf0]
120a: 00
120b: 0f 29 04 24 movaps XMMWORD PTR [rsp],xmm0
120f: 0f 28 84 24 d0 01 00 movaps xmm0,XMMWORD PTR [rsp+0x1d0]
1216: 00
1217: 0f 29 44 24 10 movaps XMMWORD PTR [rsp+0x10],xmm0
121c: ba f0 00 00 00 mov edx,0xf0
1221: 48 89 df mov rdi,rbx
1224: 4c 89 f6 mov rsi,r14
1227: ff 15 b3 2d 00 00 call QWORD PTR [rip+0x2db3] # 3fe0 <memcpy@GLIBC_2.14>
122d: 0f 28 04 24 movaps xmm0,XMMWORD PTR [rsp]
1231: 0f 29 83 f0 00 00 00 movaps XMMWORD PTR [rbx+0xf0],xmm0
1238: 0f 28 84 24 e0 00 00 movaps xmm0,XMMWORD PTR [rsp+0xe0]
123f: 00
1240: 0f 29 83 00 01 00 00 movaps XMMWORD PTR [rbx+0x100],xmm0
1247: 0f 28 84 24 d0 00 00 movaps xmm0,XMMWORD PTR [rsp+0xd0]
124e: 00
124f: 0f 29 83 10 01 00 00 movaps XMMWORD PTR [rbx+0x110],xmm0
1256: 0f 28 84 24 c0 00 00 movaps xmm0,XMMWORD PTR [rsp+0xc0]
125d: 00
125e: 0f 29 83 20 01 00 00 movaps XMMWORD PTR [rbx+0x120],xmm0
1265: 0f 28 84 24 b0 00 00 movaps xmm0,XMMWORD PTR [rsp+0xb0]
126c: 00
126d: 0f 29 83 30 01 00 00 movaps XMMWORD PTR [rbx+0x130],xmm0
1274: 0f 28 84 24 a0 00 00 movaps xmm0,XMMWORD PTR [rsp+0xa0]
127b: 00
127c: 0f 29 83 40 01 00 00 movaps XMMWORD PTR [rbx+0x140],xmm0
1283: 0f 28 84 24 90 00 00 movaps xmm0,XMMWORD PTR [rsp+0x90]
128a: 00
128b: 0f 29 83 50 01 00 00 movaps XMMWORD PTR [rbx+0x150],xmm0
1292: 0f 28 84 24 80 00 00 movaps xmm0,XMMWORD PTR [rsp+0x80]
1299: 00
129a: 0f 29 83 60 01 00 00 movaps XMMWORD PTR [rbx+0x160],xmm0
12a1: 0f 28 44 24 70 movaps xmm0,XMMWORD PTR [rsp+0x70]
12a6: 0f 29 83 70 01 00 00 movaps XMMWORD PTR [rbx+0x170],xmm0
12ad: 0f 28 44 24 60 movaps xmm0,XMMWORD PTR [rsp+0x60]
12b2: 0f 29 83 80 01 00 00 movaps XMMWORD PTR [rbx+0x180],xmm0
12b9: 0f 28 44 24 50 movaps xmm0,XMMWORD PTR [rsp+0x50]
12be: 0f 29 83 90 01 00 00 movaps XMMWORD PTR [rbx+0x190],xmm0
12c5: 0f 28 44 24 40 movaps xmm0,XMMWORD PTR [rsp+0x40]
12ca: 0f 29 83 a0 01 00 00 movaps XMMWORD PTR [rbx+0x1a0],xmm0
12d1: 0f 28 44 24 30 movaps xmm0,XMMWORD PTR [rsp+0x30]
12d6: 0f 29 83 b0 01 00 00 movaps XMMWORD PTR [rbx+0x1b0],xmm0
12dd: 0f 28 44 24 20 movaps xmm0,XMMWORD PTR [rsp+0x20]
12e2: 0f 29 83 c0 01 00 00 movaps XMMWORD PTR [rbx+0x1c0],xmm0
12e9: 0f 28 44 24 10 movaps xmm0,XMMWORD PTR [rsp+0x10]
12ee: 0f 29 83 d0 01 00 00 movaps XMMWORD PTR [rbx+0x1d0],xmm0
12f5: 48 89 d8 mov rax,rbx
12f8: 48 81 c4 e8 01 00 00 add rsp,0x1e8
12ff: 5b pop rbx
1300: 41 5e pop r14
1302: c3 ret
1303: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
130a: 00 00 00
130d: 0f 1f 00 nop DWORD PTR [rax]
0000000000001310 <_ZN61_$LT$aes..ni..Aes256Enc$u20$as$u20$crypto_common..KeyInit$GT$3new17h5a792345766b66d5E>:
1310: 48 89 f8 mov rax,rdi
1313: f3 0f 6f 06 movdqu xmm0,XMMWORD PTR [rsi]
1317: f3 0f 6f 4e 10 movdqu xmm1,XMMWORD PTR [rsi+0x10]
131c: 66 0f 3a df d1 01 aeskeygenassist xmm2,xmm1,0x1
// key expansion impl It's interesting that the keys inversion ( |
I noted in #386 it does not fix it, though I'd agree it seems to be related to inlining |
Yeah, encryption of blocks still unnecessarily copies keys to the stack. We can see it with this function: #[no_mangle]
pub unsafe fn aes_enc(cipher: &aes::Aes256, blocks: &mut [aes::Block]) {
use aes::cipher::BlockEncrypt;
cipher.encrypt_blocks(blocks);
} It generates the following assembly: Click to expandaes_enc:
sub rsp, 120
mov eax, edx
and eax, 7
cmp rdx, 8
jb .LBB0_3
mov rcx, rdx
shr rcx, 3
movaps xmm0, xmmword ptr [rdi]
movaps xmmword ptr [rsp + 96], xmm0
movaps xmm0, xmmword ptr [rdi + 16]
movaps xmmword ptr [rsp + 80], xmm0
movaps xmm0, xmmword ptr [rdi + 32]
movaps xmmword ptr [rsp + 64], xmm0
movaps xmm0, xmmword ptr [rdi + 48]
movaps xmmword ptr [rsp], xmm0
movaps xmm0, xmmword ptr [rdi + 64]
movaps xmmword ptr [rsp - 16], xmm0
movaps xmm0, xmmword ptr [rdi + 80]
movaps xmmword ptr [rsp - 64], xmm0
movaps xmm0, xmmword ptr [rdi + 96]
movaps xmmword ptr [rsp - 32], xmm0
movaps xmm0, xmmword ptr [rdi + 112]
movaps xmmword ptr [rsp + 48], xmm0
movaps xmm0, xmmword ptr [rdi + 128]
movaps xmmword ptr [rsp - 80], xmm0
movaps xmm0, xmmword ptr [rdi + 144]
movaps xmmword ptr [rsp - 48], xmm0
movaps xmm0, xmmword ptr [rdi + 160]
movaps xmmword ptr [rsp - 96], xmm0
movaps xmm0, xmmword ptr [rdi + 176]
movaps xmmword ptr [rsp - 112], xmm0
movaps xmm0, xmmword ptr [rdi + 192]
movaps xmmword ptr [rsp - 128], xmm0
movaps xmm0, xmmword ptr [rdi + 208]
movaps xmmword ptr [rsp + 32], xmm0
movdqa xmm0, xmmword ptr [rdi + 224]
movdqa xmmword ptr [rsp + 16], xmm0
lea r8, [rsi + 112]
movdqa xmm12, xmmword ptr [rsp + 32]
movdqa xmm7, xmmword ptr [rsp + 16]
.p2align 4, 0x90
.LBB0_2:
movdqu xmm5, xmmword ptr [r8 - 112]
movdqu xmm6, xmmword ptr [r8 - 96]
movdqu xmm3, xmmword ptr [r8 - 80]
movdqu xmm4, xmmword ptr [r8 - 64]
movdqu xmm1, xmmword ptr [r8 - 48]
movdqu xmm2, xmmword ptr [r8 - 32]
movdqu xmm15, xmmword ptr [r8 - 16]
movdqu xmm8, xmmword ptr [r8]
movdqa xmm9, xmmword ptr [rsp + 96]
pxor xmm5, xmm9
pxor xmm6, xmm9
movdqa xmm10, xmmword ptr [rsp + 80]
aesenc xmm5, xmm10
aesenc xmm6, xmm10
movdqa xmm0, xmmword ptr [rsp + 64]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm11, xmm0
movdqa xmm0, xmmword ptr [rsp]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm0, xmmword ptr [rsp - 16]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm0, xmmword ptr [rsp - 64]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm0, xmmword ptr [rsp - 32]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm13, xmm0
movdqa xmm14, xmmword ptr [rsp + 48]
aesenc xmm5, xmm14
aesenc xmm6, xmm14
movdqa xmm0, xmmword ptr [rsp - 80]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm0, xmmword ptr [rsp - 48]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm0, xmmword ptr [rsp - 96]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm0, xmmword ptr [rsp - 112]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
movdqa xmm0, xmmword ptr [rsp - 128]
aesenc xmm5, xmm0
aesenc xmm6, xmm0
aesenc xmm5, xmm12
aesenc xmm6, xmm12
aesenclast xmm5, xmm7
aesenclast xmm6, xmm7
movdqu xmmword ptr [r8 - 112], xmm5
movdqu xmmword ptr [r8 - 96], xmm6
movdqa xmm5, xmm9
pxor xmm3, xmm9
pxor xmm4, xmm9
movdqa xmm6, xmm10
aesenc xmm3, xmm10
aesenc xmm4, xmm10
movdqa xmm9, xmm11
aesenc xmm3, xmm11
aesenc xmm4, xmm11
movdqa xmm10, xmmword ptr [rsp]
aesenc xmm3, xmm10
aesenc xmm4, xmm10
movdqa xmm11, xmmword ptr [rsp - 16]
aesenc xmm3, xmm11
aesenc xmm4, xmm11
movdqa xmm0, xmmword ptr [rsp - 64]
aesenc xmm3, xmm0
aesenc xmm4, xmm0
aesenc xmm3, xmm13
aesenc xmm4, xmm13
movdqa xmm13, xmm14
aesenc xmm3, xmm14
aesenc xmm4, xmm14
movdqa xmm0, xmmword ptr [rsp - 80]
aesenc xmm3, xmm0
aesenc xmm4, xmm0
movdqa xmm14, xmmword ptr [rsp - 48]
aesenc xmm3, xmm14
aesenc xmm4, xmm14
movdqa xmm0, xmmword ptr [rsp - 96]
aesenc xmm3, xmm0
aesenc xmm4, xmm0
movdqa xmm0, xmmword ptr [rsp - 112]
aesenc xmm3, xmm0
aesenc xmm4, xmm0
movdqa xmm0, xmmword ptr [rsp - 128]
aesenc xmm3, xmm0
aesenc xmm4, xmm0
aesenc xmm3, xmm12
aesenc xmm4, xmm12
aesenclast xmm3, xmm7
aesenclast xmm4, xmm7
movdqu xmmword ptr [r8 - 80], xmm3
movdqu xmmword ptr [r8 - 64], xmm4
pxor xmm1, xmm5
pxor xmm2, xmm5
aesenc xmm1, xmm6
aesenc xmm2, xmm6
aesenc xmm1, xmm9
aesenc xmm2, xmm9
aesenc xmm1, xmm10
aesenc xmm2, xmm10
aesenc xmm1, xmm11
aesenc xmm2, xmm11
movdqa xmm3, xmmword ptr [rsp - 64]
aesenc xmm1, xmm3
aesenc xmm2, xmm3
movdqa xmm4, xmmword ptr [rsp - 32]
aesenc xmm1, xmm4
aesenc xmm2, xmm4
aesenc xmm1, xmm13
aesenc xmm2, xmm13
movdqa xmm0, xmmword ptr [rsp - 80]
aesenc xmm1, xmm0
aesenc xmm2, xmm0
aesenc xmm1, xmm14
aesenc xmm2, xmm14
movdqa xmm0, xmmword ptr [rsp - 96]
aesenc xmm1, xmm0
aesenc xmm2, xmm0
movdqa xmm0, xmmword ptr [rsp - 112]
aesenc xmm1, xmm0
aesenc xmm2, xmm0
movdqa xmm0, xmmword ptr [rsp - 128]
aesenc xmm1, xmm0
aesenc xmm2, xmm0
aesenc xmm1, xmm12
aesenc xmm2, xmm12
aesenclast xmm1, xmm7
aesenclast xmm2, xmm7
movdqu xmmword ptr [r8 - 48], xmm1
movdqu xmmword ptr [r8 - 32], xmm2
pxor xmm15, xmm5
pxor xmm8, xmm5
aesenc xmm15, xmm6
aesenc xmm8, xmm6
aesenc xmm15, xmm9
aesenc xmm8, xmm9
aesenc xmm15, xmm10
aesenc xmm8, xmm10
aesenc xmm15, xmm11
aesenc xmm8, xmm11
aesenc xmm15, xmm3
aesenc xmm8, xmm3
movdqa xmm1, xmm4
aesenc xmm15, xmm4
aesenc xmm8, xmm4
aesenc xmm15, xmm13
aesenc xmm8, xmm13
movdqa xmm0, xmmword ptr [rsp - 80]
aesenc xmm15, xmm0
aesenc xmm8, xmm0
aesenc xmm15, xmm14
aesenc xmm8, xmm14
movdqa xmm0, xmmword ptr [rsp - 96]
aesenc xmm15, xmm0
aesenc xmm8, xmm0
movdqa xmm0, xmmword ptr [rsp - 112]
aesenc xmm15, xmm0
aesenc xmm8, xmm0
movdqa xmm0, xmmword ptr [rsp - 128]
aesenc xmm15, xmm0
aesenc xmm8, xmm0
aesenc xmm15, xmm12
aesenc xmm8, xmm12
aesenclast xmm15, xmm7
aesenclast xmm8, xmm7
movdqu xmmword ptr [r8 - 16], xmm15
movdqu xmmword ptr [r8], xmm8
sub r8, -128
dec rcx
jne .LBB0_2
.LBB0_3:
test rax, rax
je .LBB0_11
movabs rcx, 1152921504606846968
and rdx, rcx
shl rdx, 4
add rsi, rdx
movdqa xmm14, xmmword ptr [rdi]
movdqa xmm13, xmmword ptr [rdi + 16]
movdqa xmm12, xmmword ptr [rdi + 32]
movdqa xmm11, xmmword ptr [rdi + 48]
movdqa xmm10, xmmword ptr [rdi + 64]
movdqa xmm9, xmmword ptr [rdi + 80]
movdqa xmm8, xmmword ptr [rdi + 96]
movdqa xmm7, xmmword ptr [rdi + 112]
movdqa xmm6, xmmword ptr [rdi + 128]
movdqa xmm5, xmmword ptr [rdi + 144]
movdqa xmm4, xmmword ptr [rdi + 160]
movdqa xmm3, xmmword ptr [rdi + 176]
movdqa xmm2, xmmword ptr [rdi + 192]
movdqa xmm1, xmmword ptr [rdi + 208]
movdqa xmm0, xmmword ptr [rdi + 224]
movdqu xmm15, xmmword ptr [rsi]
pxor xmm15, xmm14
aesenc xmm15, xmm13
aesenc xmm15, xmm12
aesenc xmm15, xmm11
aesenc xmm15, xmm10
aesenc xmm15, xmm9
aesenc xmm15, xmm8
aesenc xmm15, xmm7
aesenc xmm15, xmm6
aesenc xmm15, xmm5
aesenc xmm15, xmm4
aesenc xmm15, xmm3
aesenc xmm15, xmm2
aesenc xmm15, xmm1
aesenclast xmm15, xmm0
movdqu xmmword ptr [rsi], xmm15
cmp eax, 1
je .LBB0_11
movdqu xmm15, xmmword ptr [rsi + 16]
pxor xmm15, xmm14
aesenc xmm15, xmm13
aesenc xmm15, xmm12
aesenc xmm15, xmm11
aesenc xmm15, xmm10
aesenc xmm15, xmm9
aesenc xmm15, xmm8
aesenc xmm15, xmm7
aesenc xmm15, xmm6
aesenc xmm15, xmm5
aesenc xmm15, xmm4
aesenc xmm15, xmm3
aesenc xmm15, xmm2
aesenc xmm15, xmm1
aesenclast xmm15, xmm0
movdqu xmmword ptr [rsi + 16], xmm15
cmp eax, 2
je .LBB0_11
movdqu xmm15, xmmword ptr [rsi + 32]
pxor xmm15, xmm14
aesenc xmm15, xmm13
aesenc xmm15, xmm12
aesenc xmm15, xmm11
aesenc xmm15, xmm10
aesenc xmm15, xmm9
aesenc xmm15, xmm8
aesenc xmm15, xmm7
aesenc xmm15, xmm6
aesenc xmm15, xmm5
aesenc xmm15, xmm4
aesenc xmm15, xmm3
aesenc xmm15, xmm2
aesenc xmm15, xmm1
aesenclast xmm15, xmm0
movdqu xmmword ptr [rsi + 32], xmm15
cmp eax, 3
je .LBB0_11
movdqu xmm15, xmmword ptr [rsi + 48]
pxor xmm15, xmm14
aesenc xmm15, xmm13
aesenc xmm15, xmm12
aesenc xmm15, xmm11
aesenc xmm15, xmm10
aesenc xmm15, xmm9
aesenc xmm15, xmm8
aesenc xmm15, xmm7
aesenc xmm15, xmm6
aesenc xmm15, xmm5
aesenc xmm15, xmm4
aesenc xmm15, xmm3
aesenc xmm15, xmm2
aesenc xmm15, xmm1
aesenclast xmm15, xmm0
movdqu xmmword ptr [rsi + 48], xmm15
cmp eax, 4
je .LBB0_11
movdqu xmm15, xmmword ptr [rsi + 64]
pxor xmm15, xmm14
aesenc xmm15, xmm13
aesenc xmm15, xmm12
aesenc xmm15, xmm11
aesenc xmm15, xmm10
aesenc xmm15, xmm9
aesenc xmm15, xmm8
aesenc xmm15, xmm7
aesenc xmm15, xmm6
aesenc xmm15, xmm5
aesenc xmm15, xmm4
aesenc xmm15, xmm3
aesenc xmm15, xmm2
aesenc xmm15, xmm1
aesenclast xmm15, xmm0
movdqu xmmword ptr [rsi + 64], xmm15
cmp eax, 5
je .LBB0_11
movdqu xmm15, xmmword ptr [rsi + 80]
pxor xmm15, xmm14
aesenc xmm15, xmm13
aesenc xmm15, xmm12
aesenc xmm15, xmm11
aesenc xmm15, xmm10
aesenc xmm15, xmm9
aesenc xmm15, xmm8
aesenc xmm15, xmm7
aesenc xmm15, xmm6
aesenc xmm15, xmm5
aesenc xmm15, xmm4
aesenc xmm15, xmm3
aesenc xmm15, xmm2
aesenc xmm15, xmm1
aesenclast xmm15, xmm0
movdqu xmmword ptr [rsi + 80], xmm15
cmp eax, 6
je .LBB0_11
movdqu xmm15, xmmword ptr [rsi + 96]
pxor xmm15, xmm14
aesenc xmm15, xmm13
aesenc xmm15, xmm12
aesenc xmm15, xmm11
aesenc xmm15, xmm10
aesenc xmm15, xmm9
aesenc xmm15, xmm8
aesenc xmm15, xmm7
aesenc xmm15, xmm6
aesenc xmm15, xmm5
aesenc xmm15, xmm4
aesenc xmm15, xmm3
aesenc xmm15, xmm2
aesenc xmm15, xmm1
aesenclast xmm15, xmm0
movdqu xmmword ptr [rsi + 96], xmm15
.LBB0_11:
add rsp, 120
ret The function is fully inlined, so the issue may be related to rust-lang/rust#88930. Since it looks like a compiler quirk and stack bleaching is out of scope for cipher implementation crates (though we should minimize it if possible), I think we can close this issue? |
@newpavlov might be worth updating that issue with this case? Otherwise yeah, I tried to get rid of the only obvious actual stack usage in https://github.com/RustCrypto/block-ciphers/tree/aes/armv8-keep-key-schedule-off-stack (which I might still keep working on) and that didn't seem to help, so I don't think there's anything else we can do short of an upstream fix from rustc. |
Done.
It could be worth to construct similar disassembled examples as above to see whether these changes have effect or not. |
When instantiating (and using) the AES cipher (it is very likely that others are affected as well, but I haven't checked), the key will be written to the stack across a large region of memory (across 3568 bytes in my test). The fact that that it is written over such a large region makes it likely that it will not be overwritten soon, if at all.
This is the result of a minimal test case that I have broken out of our test suite. I will attach the program to this bug report. The "screenshot" demonstrates the problem, and how to find the culprit of the leaks using rr, Mozillas recording and replaying debugger:
The text was updated successfully, but these errors were encountered: