feat(cache): deduplicate persisted operations and avoid planning twice#2701
feat(cache): deduplicate persisted operations and avoid planning twice#2701
Conversation
…nt APQ saves for manifest-known operations
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThe PR enhances persisted operations manifest support by enabling APQ behavior verification, improving cache warmup sequencing to ensure asynchronous cache writes are visible before manifest warmup, adding plan cache-hit attribution metrics, and introducing a buffered-channel-based update coalescing mechanism for manifest store callbacks. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2701 +/- ##
==========================================
- Coverage 63.87% 62.86% -1.01%
==========================================
Files 554 249 -305
Lines 70000 26726 -43274
Branches 4665 0 -4665
==========================================
- Hits 44709 16802 -27907
+ Misses 23888 8534 -15354
+ Partials 1403 1390 -13
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router/core/cache_warmup.go`:
- Around line 315-324: The fast-return branch for persisted operations returns a
CacheWarmupOperationPlanResult with empty/placeholder metadata because Parse()
and NormalizeVariables() are skipped; update the branch in the
IsPersistedOperation && Request.Query != "" path (around
isPersistedOperationAlreadyCached()) to ensure k.parsedOperation is parsed and
normalized (call Parse() and NormalizeVariables() or otherwise populate
parsedOperation.Type, parsedOperation.IDString(), and
parsedOperation.Request.OperationName) before constructing and returning the
CacheWarmupOperationPlanResult so OperationType, OperationHash and OperationName
are correct for metrics aggregation.
In `@router/core/operation_processor.go`:
- Around line 1231-1247: The current isPersistedOperationAlreadyCached check
relies on persistedOperationVariableNames which is set early and misses
dimensions like clientName and OperationName, causing incorrect short-circuits;
change the check to use a new post-plan marker keyed by the exact cache key
returned by generatePersistedOperationCacheKey (including clientName and
OperationName) and set that marker only after NormalizeVariables/Validate/plan
complete (e.g., update savePersistedOperationToCache or add a new
savePostPlanPersistedMarker function to record the key in a separate map guarded
by persistedOperationVariableNamesLock or a new lock), then have
isPersistedOperationAlreadyCached consult that post-plan marker instead of
persistedOperationVariableNames.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 33a72393-1d67-43c8-980b-934df6e648bc
📒 Files selected for processing (8)
demo-router.fly.tomlrouter-tests/operations/pql_manifest_test.gorouter-tests/operations/testdata/cache_warmup/json_po_manifest_overlap/graphql.jsonrouter-tests/testenv/testdata/cdn/organization/graph/operations/manifest.jsonrouter/core/cache_warmup.gorouter/core/graph_server.gorouter/core/operation_processor.gorouter/internal/persistedoperation/client.go
… items per second limit
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router-tests/operations/pql_manifest_test.go`:
- Around line 852-873: The test currently only verifies the hash-only request
succeeds after a sleep but doesn't prove APQ wasn't populated; change the test
to assert an APQ-specific observable: after the first MakeGraphQLRequest (using
xEnv.MakeGraphQLRequest) check the APQ store directly (e.g. call a helper like
xEnv.APQStore.Has(hash) or an exposed debug endpoint such as
/debug/persistedQueries to assert the hash is absent), or if no direct check
exists, explicitly clear the APQ cache (e.g. xEnv.ClearAPQCache() or hit an
admin endpoint to evict APQ entries) before sending the hash-only request — this
ensures the second request succeeding proves the manifest served it and not a
persisted-APQ entry.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7e4bef42-fa2d-453d-ba5f-c0b52b721f22
📒 Files selected for processing (1)
router-tests/operations/pql_manifest_test.go
…lready cached operations
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
router-tests/operations/pql_manifest_test.go (1)
835-875:⚠️ Potential issue | 🟠 MajorThis still doesn't verify that APQ save was skipped.
Lines 867-873 pass whether APQ stored
ecf4...or not, because the manifest can resolve that hash by itself after the sleep. As written this proves “the hash is still resolvable,” not “APQ stayed empty.” Assert an APQ-specific observable, or clear the APQ cache before the second request so success can only come from the manifest path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router-tests/operations/pql_manifest_test.go` around lines 835 - 875, The test currently only proves the manifest can resolve the hash, not that APQ wasn't populated; after the first request and before the hash-only request, explicitly verify APQ did not save "ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38" (e.g. call a test helper like xEnv.APQStore.Get / xEnv.APQStore.Has and assert false) or clear the APQ cache (e.g. xEnv.ClearAPQCache() or xEnv.APQStore.Clear()) and then perform the hash-only request using xEnv.MakeGraphQLRequest so success can only come from the manifest path; update the test inside the "APQ skips save for manifest-known operations" t.Run block accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router/core/graph_server.go`:
- Around line 1479-1482: The SetOnUpdate rewarm path skips flushing async
Ristretto writes and creates a context that can be pre-canceled when
manifestWarmup.Timeout is zero; fix by ensuring gm.waitForCaches() is called
before each rewarm callback (same as the initial warmup) and change the context
creation for WarmupCaches so it does not wrap context.Background() with a zero
timeout—use context.WithTimeout only when manifestWarmup.Timeout > 0 (otherwise
pass a plain background context or let WarmupCaches apply its 30s default) so
post-startup rewarm runs wait for cache visibility and aren’t born canceled.
In `@router/internal/persistedoperation/pqlmanifest/store.go`:
- Around line 35-41: SetOnUpdate currently replaces s.updateCh and starts a new
goroutine each call, leaking the previous worker; change it to stop the old
worker before replacing the channel by guarding access with a mutex (or using
existing store mutex), check if s.updateCh != nil and close it (or signal
shutdown) to stop the old goroutine, then create the new buffered channel and
start a single goroutine that ranges it; reference SetOnUpdate, s.updateCh and
the worker goroutine—ensure you close the previous channel safely (avoid
double-close) while protecting s.updateCh updates with a mutex.
---
Duplicate comments:
In `@router-tests/operations/pql_manifest_test.go`:
- Around line 835-875: The test currently only proves the manifest can resolve
the hash, not that APQ wasn't populated; after the first request and before the
hash-only request, explicitly verify APQ did not save
"ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38" (e.g. call a
test helper like xEnv.APQStore.Get / xEnv.APQStore.Has and assert false) or
clear the APQ cache (e.g. xEnv.ClearAPQCache() or xEnv.APQStore.Clear()) and
then perform the hash-only request using xEnv.MakeGraphQLRequest so success can
only come from the manifest path; update the test inside the "APQ skips save for
manifest-known operations" t.Run block accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4c67f548-f6b5-40c7-acab-8e812c999c3f
📒 Files selected for processing (6)
router-tests/operations/pql_manifest_test.gorouter/core/cache_warmup.gorouter/core/graph_server.gorouter/core/operation_processor.gorouter/internal/persistedoperation/pqlmanifest/store.gorouter/internal/persistedoperation/pqlmanifest/store_test.go
✅ Files skipped from review due to trivial changes (1)
- router/core/operation_processor.go
…est-known operations
…proper resource cleanup
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Checklist
Open Source AI Manifesto
This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.