feat(cache): deduplicate persisted operations and avoid planning twice by StarpTech · Pull Request #2701 · wundergraph/cosmo

StarpTech · 2026-03-27T17:36:42Z

Summary by CodeRabbit

New Features
- Enabled persisted operations manifest support with improved cache warmup coordination between multiple cache layers
- Added operation planning time metrics collection
Bug Fixes
- Fixed cache warmup callback execution timing to prevent processing incomplete results
Tests
- Added comprehensive integration tests for manifest warmup scenarios, APQ caching behavior, and operation planning metrics

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.
Documentation has been updated on https://github.com/wundergraph/docs-website.
I have read the Contributors Guide.

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

…nt APQ saves for manifest-known operations

coderabbitai · 2026-03-27T17:36:59Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

The PR enhances persisted operations manifest support by enabling APQ behavior verification, improving cache warmup sequencing to ensure asynchronous cache writes are visible before manifest warmup, adding plan cache-hit attribution metrics, and introducing a buffered-channel-based update coalescing mechanism for manifest store callbacks.

Changes

Cohort / File(s)	Summary
Configuration `demo-router.fly.toml`	Added environment variables to enable persisted operations manifest support and optional logging of unknown operations.
Cache Warmup Core Logic `router/core/cache_warmup.go`, `router/core/graph_server.go`	Enhanced cache warmup result tracking with `PlanCacheHit` field propagation. Introduced `waitForCaches()` to synchronize ristretto caches before manifest warmup, ensuring cache writes are visible during PQL warmup. Updated warmup callback sequencing and plan cache-hit metric attribution.
Operation Processing `router/core/operation_processor.go`	Added clarifying comment on persisted operation cache finalization logic.
Manifest Store Implementation `router/internal/persistedoperation/pqlmanifest/store.go`	Replaced asynchronous callback goroutine model with buffered-channel-driven worker for sequential callback execution and coalescing of rapid update signals.
Test Coverage for Manifest & Cache Warmup `router-tests/operations/pql_manifest_test.go`	Added comprehensive integration tests for APQ GET requests, manifest warmup cache metrics, cache warmup + manifest warmup overlap, APQ persistence behavior with manifest, and OpenTelemetry planning-time metric emission.
Manifest Store Tests `router/internal/persistedoperation/pqlmanifest/store_test.go`	Expanded test suite with coverage for store state queries, callback coalescing semantics, manifest parsing (valid/invalid JSON, versioning, missing fields), and file loading error handling.
Test Data `router-tests/operations/testdata/cache_warmup/json_po_manifest_overlap/graphql.json`, `router-tests/testenv/testdata/cdn/organization/graph/operations/manifest.json`	Added cache warmup test data for manifest overlap validation. Updated manifest operation hash mappings and added new operation with minimal selection set.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main objective: deduplicating persisted operations across cache features to avoid duplicate planning.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-27T17:42:45Z

Codecov Report

❌ Patch coverage is 93.50649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.86%. Comparing base (38a648c) to head (583b9bc).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...r/internal/persistedoperation/pqlmanifest/store.go	77.77%	3 Missing and 1 partial ⚠️
router/core/graph_server.go	97.22%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2701      +/-   ##
==========================================
- Coverage   63.87%   62.86%   -1.01%     
==========================================
  Files         554      249     -305     
  Lines       70000    26726   -43274     
  Branches     4665        0    -4665     
==========================================
- Hits        44709    16802   -27907     
+ Misses      23888     8534   -15354     
+ Partials     1403     1390      -13

Files with missing lines	Coverage Δ
router/core/cache_warmup.go	`84.93% <100.00%> (-3.05%)`	⬇️
router/core/operation_processor.go	`83.77% <ø> (-1.24%)`	⬇️
router/core/router.go	`67.95% <100.00%> (-1.22%)`	⬇️
router/core/router_config.go	`93.75% <ø> (ø)`
router/internal/persistedoperation/apq/client.go	`84.37% <100.00%> (+1.04%)`	⬆️
router/internal/persistedoperation/client.go	`91.52% <100.00%> (+0.45%)`	⬆️
...internal/persistedoperation/pqlmanifest/fetcher.go	`49.29% <ø> (-7.85%)`	⬇️
.../internal/persistedoperation/pqlmanifest/poller.go	`86.36% <100.00%> (ø)`
router/core/graph_server.go	`82.45% <97.22%> (-1.74%)`	⬇️
...r/internal/persistedoperation/pqlmanifest/store.go	`90.66% <77.77%> (+37.33%)`	⬆️

... and 306 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router/core/cache_warmup.go`:
- Around line 315-324: The fast-return branch for persisted operations returns a
CacheWarmupOperationPlanResult with empty/placeholder metadata because Parse()
and NormalizeVariables() are skipped; update the branch in the
IsPersistedOperation && Request.Query != "" path (around
isPersistedOperationAlreadyCached()) to ensure k.parsedOperation is parsed and
normalized (call Parse() and NormalizeVariables() or otherwise populate
parsedOperation.Type, parsedOperation.IDString(), and
parsedOperation.Request.OperationName) before constructing and returning the
CacheWarmupOperationPlanResult so OperationType, OperationHash and OperationName
are correct for metrics aggregation.

In `@router/core/operation_processor.go`:
- Around line 1231-1247: The current isPersistedOperationAlreadyCached check
relies on persistedOperationVariableNames which is set early and misses
dimensions like clientName and OperationName, causing incorrect short-circuits;
change the check to use a new post-plan marker keyed by the exact cache key
returned by generatePersistedOperationCacheKey (including clientName and
OperationName) and set that marker only after NormalizeVariables/Validate/plan
complete (e.g., update savePersistedOperationToCache or add a new
savePostPlanPersistedMarker function to record the key in a separate map guarded
by persistedOperationVariableNamesLock or a new lock), then have
isPersistedOperationAlreadyCached consult that post-plan marker instead of
persistedOperationVariableNames.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33a72393-1d67-43c8-980b-934df6e648bc

📥 Commits

Reviewing files that changed from the base of the PR and between d5e14cc and f917d75.

📒 Files selected for processing (8)

demo-router.fly.toml
router-tests/operations/pql_manifest_test.go
router-tests/operations/testdata/cache_warmup/json_po_manifest_overlap/graphql.json
router-tests/testenv/testdata/cdn/organization/graph/operations/manifest.json
router/core/cache_warmup.go
router/core/graph_server.go
router/core/operation_processor.go
router/internal/persistedoperation/client.go

router/core/cache_warmup.go

router/core/operation_processor.go

… items per second limit

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router-tests/operations/pql_manifest_test.go`:
- Around line 852-873: The test currently only verifies the hash-only request
succeeds after a sleep but doesn't prove APQ wasn't populated; change the test
to assert an APQ-specific observable: after the first MakeGraphQLRequest (using
xEnv.MakeGraphQLRequest) check the APQ store directly (e.g. call a helper like
xEnv.APQStore.Has(hash) or an exposed debug endpoint such as
/debug/persistedQueries to assert the hash is absent), or if no direct check
exists, explicitly clear the APQ cache (e.g. xEnv.ClearAPQCache() or hit an
admin endpoint to evict APQ entries) before sending the hash-only request — this
ensures the second request succeeding proves the manifest served it and not a
persisted-APQ entry.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e4bef42-fa2d-453d-ba5f-c0b52b721f22

📥 Commits

Reviewing files that changed from the base of the PR and between f917d75 and 35e184f.

📒 Files selected for processing (1)

router-tests/operations/pql_manifest_test.go

router-tests/operations/pql_manifest_test.go

…lready cached operations

github-actions · 2026-03-27T18:09:36Z

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-9436eefaf083fb11de4036c8e9a0184e73655234-nonroot

…tems during warmup

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

router-tests/operations/pql_manifest_test.go (1)
835-875: ⚠️ Potential issue | 🟠 Major

This still doesn't verify that APQ save was skipped.

Lines 867-873 pass whether APQ stored ecf4... or not, because the manifest can resolve that hash by itself after the sleep. As written this proves “the hash is still resolvable,” not “APQ stayed empty.” Assert an APQ-specific observable, or clear the APQ cache before the second request so success can only come from the manifest path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router-tests/operations/pql_manifest_test.go` around lines 835 - 875, The
test currently only proves the manifest can resolve the hash, not that APQ
wasn't populated; after the first request and before the hash-only request,
explicitly verify APQ did not save
"ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38" (e.g. call a
test helper like xEnv.APQStore.Get / xEnv.APQStore.Has and assert false) or
clear the APQ cache (e.g. xEnv.ClearAPQCache() or xEnv.APQStore.Clear()) and
then perform the hash-only request using xEnv.MakeGraphQLRequest so success can
only come from the manifest path; update the test inside the "APQ skips save for
manifest-known operations" t.Run block accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router/core/graph_server.go`:
- Around line 1479-1482: The SetOnUpdate rewarm path skips flushing async
Ristretto writes and creates a context that can be pre-canceled when
manifestWarmup.Timeout is zero; fix by ensuring gm.waitForCaches() is called
before each rewarm callback (same as the initial warmup) and change the context
creation for WarmupCaches so it does not wrap context.Background() with a zero
timeout—use context.WithTimeout only when manifestWarmup.Timeout > 0 (otherwise
pass a plain background context or let WarmupCaches apply its 30s default) so
post-startup rewarm runs wait for cache visibility and aren’t born canceled.

In `@router/internal/persistedoperation/pqlmanifest/store.go`:
- Around line 35-41: SetOnUpdate currently replaces s.updateCh and starts a new
goroutine each call, leaking the previous worker; change it to stop the old
worker before replacing the channel by guarding access with a mutex (or using
existing store mutex), check if s.updateCh != nil and close it (or signal
shutdown) to stop the old goroutine, then create the new buffered channel and
start a single goroutine that ranges it; reference SetOnUpdate, s.updateCh and
the worker goroutine—ensure you close the previous channel safely (avoid
double-close) while protecting s.updateCh updates with a mutex.

---

Duplicate comments:
In `@router-tests/operations/pql_manifest_test.go`:
- Around line 835-875: The test currently only proves the manifest can resolve
the hash, not that APQ wasn't populated; after the first request and before the
hash-only request, explicitly verify APQ did not save
"ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38" (e.g. call a
test helper like xEnv.APQStore.Get / xEnv.APQStore.Has and assert false) or
clear the APQ cache (e.g. xEnv.ClearAPQCache() or xEnv.APQStore.Clear()) and
then perform the hash-only request using xEnv.MakeGraphQLRequest so success can
only come from the manifest path; update the test inside the "APQ skips save for
manifest-known operations" t.Run block accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c67f548-f6b5-40c7-acab-8e812c999c3f

📥 Commits

Reviewing files that changed from the base of the PR and between 6aaf7c6 and 0fce1e7.

📒 Files selected for processing (6)

router-tests/operations/pql_manifest_test.go
router/core/cache_warmup.go
router/core/graph_server.go
router/core/operation_processor.go
router/internal/persistedoperation/pqlmanifest/store.go
router/internal/persistedoperation/pqlmanifest/store_test.go

✅ Files skipped from review due to trivial changes (1)

router/core/operation_processor.go

router/core/graph_server.go

router/internal/persistedoperation/pqlmanifest/store.go

…est-known operations

…proper resource cleanup

…zation

…ed clarity

feat(cache): deduplicate persisted operations during warmup and preve…

f917d75

…nt APQ saves for manifest-known operations

StarpTech requested review from a team, Noroth, SkArchon, devsergiy, endigma and jensneuse as code owners March 27, 2026 17:36

StarpTech requested review from Aenimus and JivusAyrus March 27, 2026 17:36

github-actions bot added the router label Mar 27, 2026

StarpTech changed the title ~~feat(cache): deduplicate persisted operations during warmup and preve…~~ feat(cache): deduplicate persisted operations across features APQ, cache warmer and manifest Mar 27, 2026

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

router/core/cache_warmup.go Outdated Show resolved Hide resolved

router/core/operation_processor.go Outdated Show resolved Hide resolved

feat(pql): update warmup configuration to reduce worker count and add…

35e184f

… items per second limit

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

router-tests/operations/pql_manifest_test.go Show resolved Hide resolved

Noroth approved these changes Mar 27, 2026

View reviewed changes

feat(cache): update cache warmup logic to prevent false metrics for a…

6aaf7c6

…lready cached operations

StarpTech added 2 commits March 28, 2026 01:43

feat(cache): implement waitForCaches to ensure visibility of cached i…

0fce1e7

…tems during warmup

feat(apq): remove duplicate save logic for manifest-known operations

3c20983

StarpTech changed the title ~~feat(cache): deduplicate persisted operations across features APQ, cache warmer and manifest~~ feat(cache): deduplicate persisted operations and avoid planning twice Mar 28, 2026

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

router/core/graph_server.go Show resolved Hide resolved

router/internal/persistedoperation/pqlmanifest/store.go Outdated Show resolved Hide resolved

StarpTech added 4 commits March 28, 2026 02:10

feat(apq): add IsDistributed method and optimize save logic for manif…

f751bba

…est-known operations

feat(store): implement Close method to stop update worker and ensure …

720acdb

…proper resource cleanup

feat(pql): refactor PQL store integration and improve poller initiali…

1cd5fb8

…zation

feat(tests): update poller tests to use new store instance for improv…

583b9bc

…ed clarity

StarpTech merged commit 932f436 into main Mar 29, 2026
37 checks passed

StarpTech deleted the dustin/improve-pql-tests-for-apq-ENG-9107 branch March 29, 2026 00:52

This was referenced Apr 2, 2026

feat(pqlmanifest): implement S3 polling support for PQL manifest #2726

Merged

feat(manifest): support PO and PQL manifest for feature flags #2739

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): deduplicate persisted operations and avoid planning twice#2701

feat(cache): deduplicate persisted operations and avoid planning twice#2701
StarpTech merged 9 commits intomainfrom
dustin/improve-pql-tests-for-apq-ENG-9107

StarpTech commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

StarpTech commented Mar 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Checklist

Open Source AI Manifesto

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Router-nonroot image scan passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

StarpTech commented Mar 27, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

codecov bot commented Mar 27, 2026 •

edited

Loading

github-actions bot commented Mar 27, 2026 •

edited

Loading