Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Inline StringUTF16.compress([CI[BII)I #20218

Closed
wants to merge 1 commit into from

Conversation

knn-k
Copy link
Contributor

@knn-k knn-k commented Sep 24, 2024

This commit recognizes StringUTF16.compress([CI[BII)I, and generates inlined code for AArch64.

@knn-k
Copy link
Contributor Author

knn-k commented Sep 24, 2024

Jenkins test sanity alinux64,amac jdk17

@knn-k
Copy link
Contributor Author

knn-k commented Sep 24, 2024

Jenkins test sanity alinux64,amac jdk11

@knn-k
Copy link
Contributor Author

knn-k commented Sep 24, 2024

Jenkins test sanity alinux64,amac jdk17

@knn-k
Copy link
Contributor Author

knn-k commented Sep 24, 2024

Jenkins test sanity amac jdk11

@knn-k knn-k marked this pull request as ready for review September 24, 2024 22:46
@knn-k knn-k requested a review from dsouzai as a code owner September 24, 2024 22:46
@knn-k
Copy link
Contributor Author

knn-k commented Sep 25, 2024

This method is used only in Java 17+.
Should I add #if JAVA_SPEC_VERSION >= 17 to the code?

@0xdaryl 0xdaryl self-assigned this Sep 25, 2024
@knn-k
Copy link
Contributor Author

knn-k commented Sep 25, 2024

The Java method copies elements of a char[] array to a byte[] array.
If any one of the array elements is larger than 255, the method fails and returns 0.
https://github.com/ibmruntimes/openj9-openjdk-jdk17/blob/764b7257fe90012db56000a556f2b835542a6063/src/java.base/share/classes/java/lang/StringUTF16.java#L176-L190

The loop in the vectorized code loads 16 elements from the char[] array at once, using two SIMD registers. It then collects the upper bytes of the char elements using the uzp2 instruction, and checks the result to see if there is any element larger than 255 or not. If no element exceeds 255, the code collects the lower bytes using the uzp1 instruction, and stores the result to the byte[] array.
When there are 15 elements or less remaining, the code copies the array by 8-, 4-, 2-, and 1-elements using a single SIMD register. The instructions for collecting the upper and lower bytes are trn2 and xtn, respectively, instead of uzp2 and uzp1.

@r30shah
Copy link
Contributor

r30shah commented Sep 25, 2024

For Z we can borrow some logic / refactor code from [1] with some changes to handle return value, which was used to accelerate implEncode* intrinsics [2].

@knn-k Let me know if you want to take a crack at this / need some help from Z team.

[1]. https://github.com/eclipse/omr/blob/9b3e65e264a66efca05c8ab283a81686ef03c876/compiler/z/codegen/OMRTreeEvaluator.cpp#L15390
[2]. https://github.com/ibmruntimes/openj9-openjdk-jdk17/blob/764b7257fe90012db56000a556f2b835542a6063/src/java.base/share/classes/java/lang/StringCoding.java#L47-L58

@knn-k knn-k marked this pull request as draft October 4, 2024 05:18
This commit recognizes StringUTF16.compress([CI[BII)I, and generates
inlined code for AArch64.
@knn-k
Copy link
Contributor Author

knn-k commented Oct 4, 2024

Jenkins test sanity alinux64,amac jdk11,jdk17

@knn-k
Copy link
Contributor Author

knn-k commented Oct 4, 2024

I updated the code:

  • Improved the performance by changing the checks of the failure condition
  • Added comments
  • Added #if JAVA_SPEC_VERSION >= 11 - #endif: StringUTF16 class is available in Java 11+, but it is not in Java 8.

@knn-k
Copy link
Contributor Author

knn-k commented Oct 4, 2024

Jenkins compile alinux64 jdk8

@knn-k knn-k marked this pull request as ready for review October 4, 2024 12:34
@0xdaryl
Copy link
Contributor

0xdaryl commented Oct 8, 2024

Because StringUTF16.compress() is a public static method, it can be called from anywhere with any inputs. The Java implementation will throw NPEs and AIOBs if insufficiently sized source/dest arrays are passed in. Unless VP is able to prove otherwise, I think you will need to null and bound check the arrays before entering your intrinsified code. What we've done in other situations like this is to do those checks in the mainline code, and if they fail then branch to an OOL section that calls the original, unintrinsified Java method where the appropriate exception would be thrown.

My other comment is around testing and how you ensured the different sections (i.e., source array lengths) of your intrinsic were exercised. It's worth creating a unit test if one doesn't already exist.

Otherwise, the actual logic itself looks fine.

@knn-k
Copy link
Contributor Author

knn-k commented Oct 9, 2024

I wrote a program and used it for making sure that the StringUTF16.compress() works as expected for various offsets and lengths. I need to rewrite its code for running it as a unit test.

@knn-k
Copy link
Contributor Author

knn-k commented Oct 24, 2024

This commit is superseded by #20406 and eclipse-omr/omr#7499 that implement arraytranslateTRTO255.

@knn-k knn-k closed this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants