Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META]: SIMD adoption in OpenSearch #9423

Open
4 tasks
heemin32 opened this issue Aug 17, 2023 · 14 comments
Open
4 tasks

[META]: SIMD adoption in OpenSearch #9423

heemin32 opened this issue Aug 17, 2023 · 14 comments
Labels
enhancement Enhancement or improvement to existing feature or request Meta Meta issue, not directly linked to a PR Roadmap:Cost/Performance/Scale Project-wide roadmap label v2.11.0 Issues and PRs related to version 2.11.0

Comments

@heemin32
Copy link
Contributor

heemin32 commented Aug 17, 2023

Is your feature request related to a problem? Please describe.
Lucene 9.7.0 introduced new incubating vector APIs from Java 20 which utilize SIMD hardware in x86 AVX2 or later, and ARM NEON platforms. The feature is disabled by default. For a OpenSearch user to enable the feature, the user should pass a command line parameter during launch time. Also, OpenSearch should be running using jdk 20 or 21. To take advantage of SIMD optimization we will need OpenSearch to first run with JDK-21 by default and with SIMD modules enabled. This issue is to track items needed for SIMD enablement and potential area of improvements.

Additional context

K-NN performance comparison (https://github.com/opensearch-project/k-NN/tree/main/benchmarks/perf-tool)

  • Cluster configuration: 3 leader nodes (c5.xlarge), 3 data nodes (r5.4xlarge)
  • OpenSearch version: 2.9.0
  1. Dataset: SIFT (128 dimensions, 1M docs)
  jdk17 jdk20-simd diff
total ingest time (ms) 173861.47249 110505.82751 -36.4%
query latency p50 (ms) 8.9 8.2 -7.8%
query latency p90 (ms) 10.1 9 -10.9%
query latency p99 (ms) 11.1 10 -9.9%
  1. Dataset: GIST (960 dimensions, 1M docs)
  jdk17 jdk20-simd diff
total ingest time (ms) 1114000.82815 919003.77045 -17.5%
query latency p50 (ms) 32.5 18.2 -44.0%
query latency p90 (ms) 36.1 20.1 -44.3%
query latency p99 (ms) 38.8 24.5 -37.3%
  • Other lucene optimizations that can be consumed with vector modules.
  • Explore other code paths in OpenSearch to use the vector API
@heemin32 heemin32 added enhancement Enhancement or improvement to existing feature or request untriaged labels Aug 17, 2023
@reta
Copy link
Collaborator

reta commented Aug 18, 2023

@heemin32 I think we would be looking into switching main / 2.x to JDK-21 (due Sep 19th) since JDK-20 is not LTS (or whatever the supported long time release means)

@heemin32
Copy link
Contributor Author

heemin32 commented Aug 18, 2023

@vamshin JDK-21 GA date is 19th of Sep. OpenSearch code freeze date for 2.10.0 is 5th of Sep. That mean, we might not be able to enable SIMD for OpenSearch 2.10.0.

@reta
Copy link
Collaborator

reta commented Aug 18, 2023

That mean, we might not be able to enable SIMD for OpenSearch 2.10.0.

To note here, bundled JDK for 2.10.0 would still be JDK-17 but users could try to use JDK-20 instead at your own risks (although we have not run 2.x on JDK-20 yet it should work out of the box).

@vamshin
Copy link
Member

vamshin commented Aug 18, 2023

@reta do you see issues if we bundle jdk-20 by default in 2.10 to take advantage of SIMD out of box for k-NN users? It would go through all the regular tests we do for the release

@reta
Copy link
Collaborator

reta commented Aug 18, 2023

@reta do you see issues if we bundle jdk-20 by default in 2.10 to take advantage of SIMD out of box for k-NN users? It would go through all the regular tests we do for the release

yes, there are at least 3 issues here:

  • JDK-20 is not LTS and will officially go away in ~3 weeks
  • 2.x have never been tested with JDK-20 (that should include all the plugins)
  • JDK-20 sadly (at least the temurin builds) has release gaps (see please https://adoptium.net/temurin/releases/?version=20, specifically ppc64le)

Primarily I think the efforts should be spent on JDK-21 taking into account it is weeks away (not months or years)

@vamshin
Copy link
Member

vamshin commented Aug 19, 2023

@reta thanks for the details. Looks like we will have to push to 2.11 then

@reta reta added v2.11.0 Issues and PRs related to version 2.11.0 and removed v2.10.0 labels Aug 22, 2023
@sohami sohami changed the title Enabling SIMD in OpenSearch [META]: SIMD adoption in OpenSearch Aug 22, 2023
@sohami
Copy link
Collaborator

sohami commented Aug 22, 2023

@heemin32 I have repurposed this issue to discuss/explore the generic usage of SIMD in OpenSearch and added KNN related tasks as sub points in description. Let me know if you have any concerns or I am missing anything here

@joshpalis joshpalis added the Meta Meta issue, not directly linked to a PR label Aug 23, 2023
@heemin32
Copy link
Contributor Author

heemin32 commented Aug 24, 2023

Have done more testing regarding k-nn feature and result is available in opensearch-project/k-NN#1062

@ketanv3
Copy link
Contributor

ketanv3 commented Oct 5, 2023

Another use-case in the date_histogram aggregation I've been exploring: #10392

@kkhatua
Copy link
Member

kkhatua commented Oct 23, 2023

@heemin32 would it be possible to run with JDK21 and update the numbers here, since JDK21 has LTS ? That'll help us prioritize this.

@heemin32
Copy link
Contributor Author

@heemin32 would it be possible to run with JDK21 and update the numbers here, since JDK21 has LTS ? That'll help us prioritize this.

Have no bandwidth as of now but will try to get the result with JDK21.

@macohen
Copy link
Contributor

macohen commented Nov 14, 2023

note that the Vector API is still incubating in JDK 22: https://openjdk.org/jeps/460. Looks like potential minor revs in the API. Would we keep this off by default even when it is out of incubation or will it be safe to turn on once out of incubation? I think the Vector API itself determines what to do with or without SIMD present on the processor.

@reta
Copy link
Collaborator

reta commented Nov 14, 2023

Would we keep this off by default even when it is out of incubation or will it be safe to turn on once out of incubation?

Apache Lucene has Vector API support so having it on by default has benefits.

@macohen macohen removed the untriaged label Nov 16, 2023
@macohen
Copy link
Contributor

macohen commented Nov 16, 2023

Would we keep this off by default even when it is out of incubation or will it be safe to turn on once out of incubation?

Apache Lucene has Vector API support so having it on by default has benefits.

100%; still wanted to call out the slight risk that the Vector API is incubating. I do know and get that Lucene is accepting that risk so we are, too...

@andrross andrross added the Roadmap:Cost/Performance/Scale Project-wide roadmap label label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Meta Meta issue, not directly linked to a PR Roadmap:Cost/Performance/Scale Project-wide roadmap label v2.11.0 Issues and PRs related to version 2.11.0
Projects
Status: New
Development

No branches or pull requests

9 participants