GPU codec: fall back to CPU graph build on flush when GPU is busy by ChrisHegarty · Pull Request #149373 · elastic/elasticsearch

ChrisHegarty · 2026-05-19T09:37:10Z

During heavy indexing, multiple flush operations can compete for GPU resources simultaneously. Previously, flush would block waiting for a GPU resource to become available, which stalls the indexing thread and can cause cascading latency. This is particularly problematic when the GPU is already saturated with merge or other flush operations — the thread just sits idle waiting for its turn.

I've changed the flush path to use a non-blocking tryAcquire instead of a blocking acquire. If the GPU is busy (all resources locked or insufficient memory), flush now builds the HNSW graph on CPU using Lucene's HnswGraphBuilder. The resulting graph is written in the same Lucene99 format, so it's fully searchable by the standard reader. This means flush never blocks on GPU availability — it always makes progress, just potentially slower for that particular relatively small new segment.

To support this, I added tryAcquire to the CuVSResourceManager interface. Both acquire and tryAcquire now delegate to a shared doAcquire implementation with a nonBlocking flag, and I've added a reason parameter for diagnostics so we can see in logs which operation is acquiring or waiting for resources.

I've added tests at three levels: unit tests for the tryAcquire mechanics (including a concurrent contention test), a WriteGraphTests class that validates the CPU fallback produces byte-identical output to Lucene, and two mixed-path format tests that exercise both GPU and CPU paths within the same index on GPU nodes via a randomly-failing resource manager.

When flushing, use tryAcquire (non-blocking) to attempt GPU resource acquisition. If the GPU is busy, fall back to building the HNSW graph on CPU using HnswGraphBuilder. This avoids blocking flush threads waiting for GPU resources during heavy indexing. Also adds a `reason` parameter to acquire/tryAcquire for improved diagnostics, and refactors both methods to share a common doAcquire implementation.

…-flush

elasticsearchmachine · 2026-05-19T09:37:36Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2026-05-19T09:37:37Z

Hi @ChrisHegarty, I've created a changelog YAML for you.

github-actions · 2026-05-19T09:39:04Z

🔍 Preview links for changed docs

⏳ Building and deploying preview... View progress

This comment will be updated with preview links when the build is complete.

github-actions · 2026-05-19T09:40:55Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

ChrisHegarty added 2 commits May 19, 2026 09:22

Merge remote-tracking branch 'upstream/main' into gpu-cpu-fallback-on…

56a0d3c

…-flush

ChrisHegarty requested review from ldematte and mayya-sharipova May 19, 2026 09:37

ChrisHegarty added >bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.5.0 v9.3.5 v9.4.2 labels May 19, 2026

Update docs/changelog/149373.yaml

f2269d8

github-actions Bot deployed to docs-preview May 19, 2026 09:39 View deployment

ChrisHegarty added the test-gpu Run tests using a GPU label May 19, 2026

formating

15995ff

github-actions Bot deployed to docs-preview May 19, 2026 10:09 View deployment

formatting

7d61145

github-actions Bot deployed to docs-preview May 19, 2026 10:16 View deployment

formatting

8fe7f11

github-actions Bot deployed to docs-preview May 19, 2026 10:36 View deployment

Merge branch 'main' into gpu-cpu-fallback-on-flush

2621dd4

github-actions Bot deployed to docs-preview May 19, 2026 13:18 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU codec: fall back to CPU graph build on flush when GPU is busy#149373

GPU codec: fall back to CPU graph build on flush when GPU is busy#149373
ChrisHegarty wants to merge 7 commits into
elastic:mainfrom
ChrisHegarty:gpu-cpu-fallback-on-flush

ChrisHegarty commented May 19, 2026

Uh oh!

elasticsearchmachine commented May 19, 2026

Uh oh!

elasticsearchmachine commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026

When to use applies_to tags:

What NOT to do:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChrisHegarty commented May 19, 2026

Uh oh!

elasticsearchmachine commented May 19, 2026

Uh oh!

elasticsearchmachine commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions Bot commented May 19, 2026

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 19, 2026 •

edited

Loading