Skip to content

Conversation

@nathanchance
Copy link
Member

By default, QEMU's TCG uses the architected QARMA algorithm for pointer
authentication, which is better cryptographically but extremely slow to
emulate. As of QEMU 6.0.0, there is an "Implementation Defined"
algorithm available, which is not cryptographic but significantly faster
to run.

ARCH=arm64 defconfig:

Benchmark 1: QARMA
  Time (mean ± σ):     10.381 s ±  0.048 s    [User: 8.469 s, System: 0.142 s]
  Range (min … max):   10.317 s … 10.478 s    50 runs

Benchmark 2: Implementation Defined
  Time (mean ± σ):      7.051 s ±  0.015 s    [User: 5.125 s, System: 0.130 s]
  Range (min … max):    7.014 s …  7.083 s    50 runs

Summary
  'Implementation Defined' ran
    1.47 ± 0.01 times faster than 'QARMA'

ARCH=arm64 defconfig + KASAN_SW_TAGS + the KUnit tests:

Benchmark 1: QARMA
  Time (mean ± σ):     185.997 s ±  2.778 s    [User: 184.043 s, System: 0.593 s]
  Range (min … max):   182.816 s … 190.463 s    10 runs

Benchmark 2: Implementation Defined
  Time (mean ± σ):     29.618 s ±  0.301 s    [User: 26.951 s, System: 0.500 s]
  Range (min … max):   29.185 s … 30.103 s    10 runs

Summary
  'Implementation Defined' ran
    6.28 ± 0.11 times faster than 'QARMA'

This should help avoid weird timeouts in CI, as the VMs can be quite
slow.

Aside from the benchmarks above, this change is visible in dmesg:

[    0.000000] CPU features: detected: Address authentication (architected QARMA5 algorithm)

vs.

[    0.000000] CPU features: detected: Address authentication (IMP DEF algorithm)

Link: https://lore.kernel.org/YlgVa+AP0g4IYvzN@lakrids/
Link: https://gitlab.com/qemu-project/qemu/-/blob/v7.0.0/docs/system/arm/cpu-features.rst

…ithm

By default, QEMU's TCG uses the architected QARMA algorithm for pointer
authentication, which is better cryptographically but extremely slow to
emulate. As of QEMU 6.0.0, there is an "Implementation Defined"
algorithm available, which is not cryptographic but significantly faster
to run.

ARCH=arm64 defconfig:

Benchmark 1: QARMA
  Time (mean ± σ):     10.381 s ±  0.048 s    [User: 8.469 s, System: 0.142 s]
  Range (min … max):   10.317 s … 10.478 s    50 runs

Benchmark 2: Implementation Defined
  Time (mean ± σ):      7.051 s ±  0.015 s    [User: 5.125 s, System: 0.130 s]
  Range (min … max):    7.014 s …  7.083 s    50 runs

Summary
  'Implementation Defined' ran
    1.47 ± 0.01 times faster than 'QARMA'

ARCH=arm64 defconfig + KASAN_SW_TAGS + the KUnit tests:

Benchmark 1: QARMA
  Time (mean ± σ):     185.997 s ±  2.778 s    [User: 184.043 s, System: 0.593 s]
  Range (min … max):   182.816 s … 190.463 s    10 runs

Benchmark 2: Implementation Defined
  Time (mean ± σ):     29.618 s ±  0.301 s    [User: 26.951 s, System: 0.500 s]
  Range (min … max):   29.185 s … 30.103 s    10 runs

Summary
  'Implementation Defined' ran
    6.28 ± 0.11 times faster than 'QARMA'

This should help avoid weird timeouts in CI, as the VMs can be quite
slow.

Aside from the benchmarks above, this change is visible in dmesg:

[    0.000000] CPU features: detected: Address authentication (architected QARMA5 algorithm)

vs.

[    0.000000] CPU features: detected: Address authentication (IMP DEF algorithm)

Link: https://lore.kernel.org/YlgVa+AP0g4IYvzN@lakrids/
Link: https://gitlab.com/qemu-project/qemu/-/blob/v7.0.0/docs/system/arm/cpu-features.rst
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Copy link
Member

@nickdesaulniers nickdesaulniers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting find! ⏩ 🏃

@nickdesaulniers nickdesaulniers requested a review from broonie April 28, 2022 21:31
@nathanchance nathanchance merged commit 49c96d4 into ClangBuiltLinux:main Apr 28, 2022
@nathanchance
Copy link
Member Author

Thanks for the review as always!

@nathanchance nathanchance deleted the pauth-impdef branch April 28, 2022 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants