Skip to content

feat(cache): deduplicate persisted operations and avoid planning twice#2701

Merged
StarpTech merged 9 commits intomainfrom
dustin/improve-pql-tests-for-apq-ENG-9107
Mar 29, 2026
Merged

feat(cache): deduplicate persisted operations and avoid planning twice#2701
StarpTech merged 9 commits intomainfrom
dustin/improve-pql-tests-for-apq-ENG-9107

Conversation

@StarpTech
Copy link
Copy Markdown
Contributor

@StarpTech StarpTech commented Mar 27, 2026

Summary by CodeRabbit

  • New Features

    • Enabled persisted operations manifest support with improved cache warmup coordination between multiple cache layers
    • Added operation planning time metrics collection
  • Bug Fixes

    • Fixed cache warmup callback execution timing to prevent processing incomplete results
  • Tests

    • Added comprehensive integration tests for manifest warmup scenarios, APQ caching behavior, and operation planning metrics

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/docs-website.
  • I have read the Contributors Guide.

Open Source AI Manifesto

This project follows the principles of the Open Source AI Manifesto. Please ensure your contribution aligns with its principles.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

The PR enhances persisted operations manifest support by enabling APQ behavior verification, improving cache warmup sequencing to ensure asynchronous cache writes are visible before manifest warmup, adding plan cache-hit attribution metrics, and introducing a buffered-channel-based update coalescing mechanism for manifest store callbacks.

Changes

Cohort / File(s) Summary
Configuration
demo-router.fly.toml
Added environment variables to enable persisted operations manifest support and optional logging of unknown operations.
Cache Warmup Core Logic
router/core/cache_warmup.go, router/core/graph_server.go
Enhanced cache warmup result tracking with PlanCacheHit field propagation. Introduced waitForCaches() to synchronize ristretto caches before manifest warmup, ensuring cache writes are visible during PQL warmup. Updated warmup callback sequencing and plan cache-hit metric attribution.
Operation Processing
router/core/operation_processor.go
Added clarifying comment on persisted operation cache finalization logic.
Manifest Store Implementation
router/internal/persistedoperation/pqlmanifest/store.go
Replaced asynchronous callback goroutine model with buffered-channel-driven worker for sequential callback execution and coalescing of rapid update signals.
Test Coverage for Manifest & Cache Warmup
router-tests/operations/pql_manifest_test.go
Added comprehensive integration tests for APQ GET requests, manifest warmup cache metrics, cache warmup + manifest warmup overlap, APQ persistence behavior with manifest, and OpenTelemetry planning-time metric emission.
Manifest Store Tests
router/internal/persistedoperation/pqlmanifest/store_test.go
Expanded test suite with coverage for store state queries, callback coalescing semantics, manifest parsing (valid/invalid JSON, versioning, missing fields), and file loading error handling.
Test Data
router-tests/operations/testdata/cache_warmup/json_po_manifest_overlap/graphql.json, router-tests/testenv/testdata/cdn/organization/graph/operations/manifest.json
Added cache warmup test data for manifest overlap validation. Updated manifest operation hash mappings and added new operation with minimal selection set.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective: deduplicating persisted operations across cache features to avoid duplicate planning.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@StarpTech StarpTech changed the title feat(cache): deduplicate persisted operations during warmup and preve… feat(cache): deduplicate persisted operations across features APQ, cache warmer and manifest Mar 27, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 93.50649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.86%. Comparing base (38a648c) to head (583b9bc).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...r/internal/persistedoperation/pqlmanifest/store.go 77.77% 3 Missing and 1 partial ⚠️
router/core/graph_server.go 97.22% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2701      +/-   ##
==========================================
- Coverage   63.87%   62.86%   -1.01%     
==========================================
  Files         554      249     -305     
  Lines       70000    26726   -43274     
  Branches     4665        0    -4665     
==========================================
- Hits        44709    16802   -27907     
+ Misses      23888     8534   -15354     
+ Partials     1403     1390      -13     
Files with missing lines Coverage Δ
router/core/cache_warmup.go 84.93% <100.00%> (-3.05%) ⬇️
router/core/operation_processor.go 83.77% <ø> (-1.24%) ⬇️
router/core/router.go 67.95% <100.00%> (-1.22%) ⬇️
router/core/router_config.go 93.75% <ø> (ø)
router/internal/persistedoperation/apq/client.go 84.37% <100.00%> (+1.04%) ⬆️
router/internal/persistedoperation/client.go 91.52% <100.00%> (+0.45%) ⬆️
...internal/persistedoperation/pqlmanifest/fetcher.go 49.29% <ø> (-7.85%) ⬇️
.../internal/persistedoperation/pqlmanifest/poller.go 86.36% <100.00%> (ø)
router/core/graph_server.go 82.45% <97.22%> (-1.74%) ⬇️
...r/internal/persistedoperation/pqlmanifest/store.go 90.66% <77.77%> (+37.33%) ⬆️

... and 306 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router/core/cache_warmup.go`:
- Around line 315-324: The fast-return branch for persisted operations returns a
CacheWarmupOperationPlanResult with empty/placeholder metadata because Parse()
and NormalizeVariables() are skipped; update the branch in the
IsPersistedOperation && Request.Query != "" path (around
isPersistedOperationAlreadyCached()) to ensure k.parsedOperation is parsed and
normalized (call Parse() and NormalizeVariables() or otherwise populate
parsedOperation.Type, parsedOperation.IDString(), and
parsedOperation.Request.OperationName) before constructing and returning the
CacheWarmupOperationPlanResult so OperationType, OperationHash and OperationName
are correct for metrics aggregation.

In `@router/core/operation_processor.go`:
- Around line 1231-1247: The current isPersistedOperationAlreadyCached check
relies on persistedOperationVariableNames which is set early and misses
dimensions like clientName and OperationName, causing incorrect short-circuits;
change the check to use a new post-plan marker keyed by the exact cache key
returned by generatePersistedOperationCacheKey (including clientName and
OperationName) and set that marker only after NormalizeVariables/Validate/plan
complete (e.g., update savePersistedOperationToCache or add a new
savePostPlanPersistedMarker function to record the key in a separate map guarded
by persistedOperationVariableNamesLock or a new lock), then have
isPersistedOperationAlreadyCached consult that post-plan marker instead of
persistedOperationVariableNames.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 33a72393-1d67-43c8-980b-934df6e648bc

📥 Commits

Reviewing files that changed from the base of the PR and between d5e14cc and f917d75.

📒 Files selected for processing (8)
  • demo-router.fly.toml
  • router-tests/operations/pql_manifest_test.go
  • router-tests/operations/testdata/cache_warmup/json_po_manifest_overlap/graphql.json
  • router-tests/testenv/testdata/cdn/organization/graph/operations/manifest.json
  • router/core/cache_warmup.go
  • router/core/graph_server.go
  • router/core/operation_processor.go
  • router/internal/persistedoperation/client.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router-tests/operations/pql_manifest_test.go`:
- Around line 852-873: The test currently only verifies the hash-only request
succeeds after a sleep but doesn't prove APQ wasn't populated; change the test
to assert an APQ-specific observable: after the first MakeGraphQLRequest (using
xEnv.MakeGraphQLRequest) check the APQ store directly (e.g. call a helper like
xEnv.APQStore.Has(hash) or an exposed debug endpoint such as
/debug/persistedQueries to assert the hash is absent), or if no direct check
exists, explicitly clear the APQ cache (e.g. xEnv.ClearAPQCache() or hit an
admin endpoint to evict APQ entries) before sending the hash-only request — this
ensures the second request succeeding proves the manifest served it and not a
persisted-APQ entry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e4bef42-fa2d-453d-ba5f-c0b52b721f22

📥 Commits

Reviewing files that changed from the base of the PR and between f917d75 and 35e184f.

📒 Files selected for processing (1)
  • router-tests/operations/pql_manifest_test.go

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-9436eefaf083fb11de4036c8e9a0184e73655234-nonroot

@StarpTech StarpTech changed the title feat(cache): deduplicate persisted operations across features APQ, cache warmer and manifest feat(cache): deduplicate persisted operations and avoid planning twice Mar 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
router-tests/operations/pql_manifest_test.go (1)

835-875: ⚠️ Potential issue | 🟠 Major

This still doesn't verify that APQ save was skipped.

Lines 867-873 pass whether APQ stored ecf4... or not, because the manifest can resolve that hash by itself after the sleep. As written this proves “the hash is still resolvable,” not “APQ stayed empty.” Assert an APQ-specific observable, or clear the APQ cache before the second request so success can only come from the manifest path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router-tests/operations/pql_manifest_test.go` around lines 835 - 875, The
test currently only proves the manifest can resolve the hash, not that APQ
wasn't populated; after the first request and before the hash-only request,
explicitly verify APQ did not save
"ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38" (e.g. call a
test helper like xEnv.APQStore.Get / xEnv.APQStore.Has and assert false) or
clear the APQ cache (e.g. xEnv.ClearAPQCache() or xEnv.APQStore.Clear()) and
then perform the hash-only request using xEnv.MakeGraphQLRequest so success can
only come from the manifest path; update the test inside the "APQ skips save for
manifest-known operations" t.Run block accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router/core/graph_server.go`:
- Around line 1479-1482: The SetOnUpdate rewarm path skips flushing async
Ristretto writes and creates a context that can be pre-canceled when
manifestWarmup.Timeout is zero; fix by ensuring gm.waitForCaches() is called
before each rewarm callback (same as the initial warmup) and change the context
creation for WarmupCaches so it does not wrap context.Background() with a zero
timeout—use context.WithTimeout only when manifestWarmup.Timeout > 0 (otherwise
pass a plain background context or let WarmupCaches apply its 30s default) so
post-startup rewarm runs wait for cache visibility and aren’t born canceled.

In `@router/internal/persistedoperation/pqlmanifest/store.go`:
- Around line 35-41: SetOnUpdate currently replaces s.updateCh and starts a new
goroutine each call, leaking the previous worker; change it to stop the old
worker before replacing the channel by guarding access with a mutex (or using
existing store mutex), check if s.updateCh != nil and close it (or signal
shutdown) to stop the old goroutine, then create the new buffered channel and
start a single goroutine that ranges it; reference SetOnUpdate, s.updateCh and
the worker goroutine—ensure you close the previous channel safely (avoid
double-close) while protecting s.updateCh updates with a mutex.

---

Duplicate comments:
In `@router-tests/operations/pql_manifest_test.go`:
- Around line 835-875: The test currently only proves the manifest can resolve
the hash, not that APQ wasn't populated; after the first request and before the
hash-only request, explicitly verify APQ did not save
"ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38" (e.g. call a
test helper like xEnv.APQStore.Get / xEnv.APQStore.Has and assert false) or
clear the APQ cache (e.g. xEnv.ClearAPQCache() or xEnv.APQStore.Clear()) and
then perform the hash-only request using xEnv.MakeGraphQLRequest so success can
only come from the manifest path; update the test inside the "APQ skips save for
manifest-known operations" t.Run block accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c67f548-f6b5-40c7-acab-8e812c999c3f

📥 Commits

Reviewing files that changed from the base of the PR and between 6aaf7c6 and 0fce1e7.

📒 Files selected for processing (6)
  • router-tests/operations/pql_manifest_test.go
  • router/core/cache_warmup.go
  • router/core/graph_server.go
  • router/core/operation_processor.go
  • router/internal/persistedoperation/pqlmanifest/store.go
  • router/internal/persistedoperation/pqlmanifest/store_test.go
✅ Files skipped from review due to trivial changes (1)
  • router/core/operation_processor.go

@StarpTech StarpTech merged commit 932f436 into main Mar 29, 2026
37 checks passed
@StarpTech StarpTech deleted the dustin/improve-pql-tests-for-apq-ENG-9107 branch March 29, 2026 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants