Skip to content

Conversation

@smemery
Copy link
Contributor

@smemery smemery commented Nov 7, 2025

This fix improves performance in the AES key schedule generation by eliminating an unnecessary object and unnecessary mask in the inverse key schedule.

The micro:org.openjdk.bench.javax.crypto.AESReinit benchmark results are improved by 6.96% for arm64 and 7.79% for x86_64.

Thank you @jnimeh for catching the unnecessary byte mask!


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8371450: AES performance improvements for key schedule generation (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28188/head:pull/28188
$ git checkout pull/28188

Update a local copy of the PR:
$ git checkout pull/28188
$ git pull https://git.openjdk.org/jdk.git pull/28188/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28188

View PR using the GUI difftool:
$ git pr show -t 28188

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28188.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 7, 2025

👋 Welcome back smemery! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 7, 2025

@smemery This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the security security-dev@openjdk.org label Nov 7, 2025
@openjdk
Copy link

openjdk bot commented Nov 7, 2025

@smemery The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 7, 2025
@mlbridge
Copy link

mlbridge bot commented Nov 7, 2025

Webrevs

| ((SBOX[(b1 & 0xF0) >> 4][b1 & 0x0F] & 0xFF) << 16)
| ((SBOX[(b2 & 0xF0) >> 4][b2 & 0x0F] & 0xFF) << 8)
| (SBOX[(b3 & 0xF0) >> 4][b3 & 0x0F] & 0xFF);
| (SBOX[(word & 0xF0) >> 4][word & 0x0F] & 0xFF);
Copy link
Contributor

@ferakocz ferakocz Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no need for any of these bytes. Every index can be computed as "(word >> offset) & 0x0F". Actually, if you define SBOX as a 1-dim array, you can index into it with "(word >> offset) & 0xFF".

Copy link
Contributor Author

@smemery smemery Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. The byte assignments were to avoid three redundant shift operations.

int widx = i * WB;
int idx = kLen - widx;

dw[idx] = TMI0[w[widx] >>> 24] ^ TMI1[(w[widx] >> 16) & 0xFF]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there would be any benefit to putting w[widx] through w[widx+3] on local int variables? In some cases I've seen where that increases register pressure and can lead to some perf benefits. I'm not sure if this is one of those cases but it seems like you'd only need to reference memory once instead of 4 times per assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe my original changes here utilize a "MergeStore" technique that the compiler optimizes. I've asked @minborg to see if I got this right. To verify the optimization here, I used the separate local int variable technique and saw a 0.7% decrease in benchmark performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, your experiment answers my question. Unless @minborg says otherwise I'm fine with what you have. You're already seeing improvements so that's great.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 8, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfr Pull request is ready for review security security-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

3 participants