GPU codec: fall back to CPU graph build on flush when GPU is busy#149373
GPU codec: fall back to CPU graph build on flush when GPU is busy#149373ChrisHegarty wants to merge 7 commits into
Conversation
When flushing, use tryAcquire (non-blocking) to attempt GPU resource acquisition. If the GPU is busy, fall back to building the HNSW graph on CPU using HnswGraphBuilder. This avoids blocking flush threads waiting for GPU resources during heavy indexing. Also adds a `reason` parameter to acquire/tryAcquire for improved diagnostics, and refactors both methods to share a common doAcquire implementation.
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @ChrisHegarty, I've created a changelog YAML for you. |
🔍 Preview links for changed docs⏳ Building and deploying preview... View progress This comment will be updated with preview links when the build is complete. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
During heavy indexing, multiple flush operations can compete for GPU resources simultaneously. Previously, flush would block waiting for a GPU resource to become available, which stalls the indexing thread and can cause cascading latency. This is particularly problematic when the GPU is already saturated with merge or other flush operations — the thread just sits idle waiting for its turn.
I've changed the flush path to use a non-blocking
tryAcquireinstead of a blockingacquire. If the GPU is busy (all resources locked or insufficient memory),flushnow builds the HNSW graph on CPU using Lucene'sHnswGraphBuilder. The resulting graph is written in the same Lucene99 format, so it's fully searchable by the standard reader. This meansflushnever blocks on GPU availability — it always makes progress, just potentially slower for that particular relatively small new segment.To support this, I added
tryAcquireto theCuVSResourceManagerinterface. BothacquireandtryAcquirenow delegate to a shareddoAcquireimplementation with a nonBlocking flag, and I've added areasonparameter for diagnostics so we can see in logs which operation is acquiring or waiting for resources.I've added tests at three levels: unit tests for the
tryAcquiremechanics (including a concurrent contention test), aWriteGraphTestsclass that validates the CPU fallback produces byte-identical output to Lucene, and two mixed-path format tests that exercise both GPU and CPU paths within the same index on GPU nodes via a randomly-failing resource manager.