AES-NI

I'm not sure if this is the right place for this proposal, but I would like to suggest some very useful commands to speed up cryptography and especially cryptographic and non-cryptographic hashing.

## What are the instructions being proposed?

  **AES-NI** (Advanced Encryption Standard New Instructions) is extended instruction set which accelerate AES encryption / decription. 
 
`v128.aes.enc(a, b)`
---
### Perform one round of an AES decryption flow

general:
```rust
fn aesenc(v128 a, v128 b) {
  return MixColumns(ShiftRows(SubBytes(a))) ^ b
}
```
x86: `aesenc`
ARM: `AESMC + AESE + EOR`
PPC: `vcipher`

`v128.aes.enc_last(a, b)`
---
### Perform the last round of an AES decryption flow

general:
```rust
fn aesenc_last(v128 a, v128 b) {
  return ShiftRows(SubBytes(a)) ^ b
}
```
x86: `aesenclast`
ARM: `AESE + EOR`
PPC: `vcipherlast`

`v128.aes.dec(a, b)`
---
### Perform one round of an AES decryption flow

general:
```rust
fn aesdec(v128 a, v128 b) {
  return MixColumnsInv(ShiftRowsInv(SubBytesInv(a))) ^ b
}
```
x86: `aesdec`
ARM: `AESIMC + AESD + EOR`
PPC: `vncipher`

`v128.aes.dec_last(a, b)`
---
### Perform the last round of an AES decryption flow

general:
```rust
fn aesdec_last(v128 a, v128 b) {
  return ShiftRowsInv(SubBytesInv(a)) ^ b
}
```
x86: `aesdeclast`
ARM: `AESD + EOR`
PPC: `vncipherlast`

`v128.aes.keygen(a, imm8)`
---
### Generating the round keys used for encryption

general:
```rust
fn aeskeygen(v128 a, u8 imm) {
  X3[31:0] = a[127:96]
  X2[31:0] = a[95:64]
  X1[31:0] = a[63:32]
  X0[31:0] = a[31:0]
  RCON[31:0] = ZeroExtend(imm8)
  vDst[31:0] = SubWord(X1)
  vDst[63:32] = RotWord(SubWord(X1)) ^ RCON
  vDst[95:64] = SubWord(X3)
  vDst[127:96] = RotWord(SubWord(X3)) ^ RCON
  return vDst
}
```
x86: `aeskeygenassist`
ARM: Efficient emulation on ARM (See [emulating-x86-aes-intrinsics-on-armv8-a](https://blog.michaelbrase.com/2018/05/08/emulating-x86-aes-intrinsics-on-armv8-a)):
```cpp
__m128i _mm_aeskeygenassist_si128_arm(__m128i a, const int imm8) {
    a = vaeseq_u8(a, (__m128i){}); // perform ShiftRows and SubBytes on "a"
    uint32_t rcon = (uint32_t)(uint8_t)imm8;
    __m128i dest = {
        // Undo ShiftRows step from AESE and extract X1 and X3
        a[0x4], a[0x1], a[0xE], a[0xB], // SubBytes(X1)
        a[0x1], a[0xE], a[0xB], a[0x4], // ROT(SubBytes(X1))
        a[0xC], a[0x9], a[0x6], a[0x3], // SubBytes(X3)
        a[0x9], a[0x6], a[0x3], a[0xC], // ROT(SubBytes(X3))
    };
    return dest ^ (__m128i)((uint32x4_t){0, rcon, 0, rcon});
}
```
PPC: 
```cpp
__m128i _mm_aeskeygenassist_si128_ppc(__m128i a, const int imm8) {
    a = __builtin_crypto_vcipherlast(a, (__m128i){}); // perform ShiftRows and SubBytes on "a"
    uint32_t rcon = (uint32_t)(uint8_t)imm8;
    __m128i dest = {
        // Undo ShiftRows step from vcipherlast and extract X1 and X3
        a[0x4], a[0x1], a[0xE], a[0xB], // SubBytes(X1)
        a[0x1], a[0xE], a[0xB], a[0x4], // ROT(SubBytes(X1))
        a[0xC], a[0x9], a[0x6], a[0x3], // SubBytes(X3)
        a[0x9], a[0x6], a[0x3], a[0xC], // ROT(SubBytes(X3))
    };
    return dest ^ (__m128i)((uint32x4_t){0, rcon, 0, rcon});
}
```

Details about operations like `MixColumns`, `ShiftRowsInv` and etc see [Intel's white paper](https://www.intel.com/content/dam/doc/white-paper/advanced-encryption-standard-new-instructions-set-paper.pdf)

### How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

  - **x86** support by Intel (Westmere, Sandy/Ivy Bridge, Haswell, Skylake and etc) and AMD (>= Jaguar, >= Puma, >= Zen1).
  - **ARM** Optionally support on ARMv8-A (ARM Cortex-A30/50/70 cores), Qualcomm 805, Exynos 3 series .
  - **RISC-V** doesn't have such specific instructions but a number of RISC-V chips include integrated AES co-processors. And perhaps [will be standardize in future](https://tches.iacr.org/index.php/TCHES/article/view/8729/8329)
  - **POWER8/9/10** also support this. (thanks to @nemequ for pointing that out).

### What use cases are there?

  - speedup AES encryption / decryption
  - fast cryptographic and non-cryptographic hashing. Check this benchmark results: https://github.com/rurban/smhasher/blob/master/README.md. All fastest hash algos uses AES-NI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AES-NI #1433

What are the instructions being proposed?

`v128.aes.enc(a, b)`

Perform one round of an AES decryption flow

`v128.aes.enc_last(a, b)`

Perform the last round of an AES decryption flow

`v128.aes.dec(a, b)`

Perform one round of an AES decryption flow

`v128.aes.dec_last(a, b)`

Perform the last round of an AES decryption flow

`v128.aes.keygen(a, imm8)`

Generating the round keys used for encryption

How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

What use cases are there?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AES-NI #1433

Description

What are the instructions being proposed?

v128.aes.enc(a, b)

Perform one round of an AES decryption flow

v128.aes.enc_last(a, b)

Perform the last round of an AES decryption flow

v128.aes.dec(a, b)

Perform one round of an AES decryption flow

v128.aes.dec_last(a, b)

Perform the last round of an AES decryption flow

v128.aes.keygen(a, imm8)

Generating the round keys used for encryption

How does behavior differ across processors? What new fingerprinting surfaces will be exposed?

What use cases are there?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`v128.aes.enc(a, b)`

`v128.aes.enc_last(a, b)`

`v128.aes.dec(a, b)`

`v128.aes.dec_last(a, b)`

`v128.aes.keygen(a, imm8)`