Skip to content

Conversation

@alepane21
Copy link
Contributor

@alepane21 alepane21 commented Oct 30, 2025

We want to expose query planning time to make easier and more precise to track query planner's performance regressions.

Summary by CodeRabbit

  • New Features

    • Query plan results now include per-operation timing metrics (parse, normalize, validate, plan) and aggregated totals; timings are captured for successes and failures and included in result objects.
  • Tests

    • Tests and benchmarks updated to accept timing data and to compare structured plan results instead of raw JSON strings.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@coderabbitai
Copy link

coderabbitai bot commented Oct 30, 2025

Walkthrough

Instruments the planning pipeline with per-operation timing via a new public OperationTimes type; updates planner function signatures to return and propagate timing data through parse, normalize, validate, and plan phases (including error paths); exposes per-result timings in the plan generator and adjusts tests/benchmarks.

Changes

Cohort / File(s) Summary
Core Planning Instrumentation
router/core/plan_generator.go
Adds public OperationTimes (ParseTime, NormalizeTime, ValidateTime, PlanTime) with TotalTime() and Merge(). Updates PlanOperation, ParseAndPrepareOperation, PrepareOperation, and PlanPreparedOperation signatures to return OperationTimes; measures and accumulates timings around each phase and returns timings on success and error paths.
Planner Tests & Benchmarks
router/core/plan_generator_test.go, router/internal/planningbenchmark/benchmark_test.go
Updates call sites to handle the additional OperationTimes return value from ParseAndPrepareOperation and PlanPreparedOperation (captured or ignored), preserving existing control flow and error handling.
Plan Generator Result & Generator
router/pkg/plan_generator/plan_generator.go
Adds Timings core.OperationTimes to QueryPlanResult and populates it from Planner.PlanOperation so per-file timing data is included in results.
Plan Generator Tests Refactor
router/pkg/plan_generator/plan_generator_test.go
Replaces raw JSON string equality checks with unmarshalling into QueryPlanResults and per-entry comparisons of Plan, Error, and Warning; minor import formatting change.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify that OperationTimes values are correctly captured, merged, and returned on all success and error paths in router/core/plan_generator.go.
  • Confirm QueryPlanResult.Timings is populated consistently and that JSON marshaling still behaves as expected.
  • Check updated tests/benchmarks properly ignore or validate timing data where intended.

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: expose timing of planning in the report' directly and clearly describes the main change: adding timing information to planning reports for tracking performance.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Oct 30, 2025

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-1f43bba1fe504a2709bc39544e26508f95949d2c-nonroot

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 14, 2025
@alepane21 alepane21 changed the title chore: initial logic to track timing feat: expose timing of planning in the report Nov 14, 2025
…ry-planner-regression-ci-aware-of-query-plan-timings
@alepane21 alepane21 marked this pull request as ready for review November 14, 2025 09:39
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
router/core/plan_generator.go (2)

57-77: OperationTimes struct and helpers are sound; consider value receiver for TotalTime

The shape of OperationTimes and Merge is clear and composable. Since OperationTimes is small and immutable in practice, TotalTime could use a value receiver for consistency with Merge and to avoid accidental nil-pointer calls on *OperationTimes. This is optional and non‑blocking.

-func (ot *OperationTimes) TotalTime() time.Duration {
-	return ot.ParseTime + ot.NormalizeTime + ot.ValidateTime + ot.PostProcessTime + ot.PlanTime
-}
+func (ot OperationTimes) TotalTime() time.Duration {
+	return ot.ParseTime + ot.NormalizeTime + ot.ValidateTime + ot.PostProcessTime + ot.PlanTime
+}

148-171: Simplify timing capture in PrepareOperation to avoid duplicated assignments

NormalizeTime and ValidateTime are each assigned twice (once in the error branch and once after the if). This is correct but a bit noisy and easy to get wrong on future edits. You can measure once, then branch on the error:

-	opTimes := OperationTimes{}
-
-	start := time.Now()
-	if err := pl.normalizeOperation(operation, operationName); err != nil {
-		opTimes.NormalizeTime = time.Since(start)
-		return nil, opTimes, &PlannerOperationValidationError{err: err}
-	}
-	opTimes.NormalizeTime = time.Since(start)
-
-	start = time.Now()
-	if err := pl.validateOperation(operation); err != nil {
-		opTimes.ValidateTime = time.Since(start)
-		return nil, opTimes, &PlannerOperationValidationError{err: err}
-	}
-	opTimes.ValidateTime = time.Since(start)
+	opTimes := OperationTimes{}
+
+	start := time.Now()
+	err := pl.normalizeOperation(operation, operationName)
+	opTimes.NormalizeTime = time.Since(start)
+	if err != nil {
+		return nil, opTimes, &PlannerOperationValidationError{err: err}
+	}
+
+	start = time.Now()
+	err = pl.validateOperation(operation)
+	opTimes.ValidateTime = time.Since(start)
+	if err != nil {
+		return nil, opTimes, &PlannerOperationValidationError{err: err}
+	}

Semantics stay the same, but the timing code is easier to follow and less error‑prone.

router/pkg/plan_generator/plan_generator_test.go (3)

214-222: Check json.Unmarshal errors and lengths when comparing results vs expected

The switch from raw string equality to JSON struct comparison is the right move now that Timings adds extra fields. However, the current code ignores json.Unmarshal errors and does not assert that resultsStruct.Plans and resultsExpectedStruct.Plans have the same length, which can let malformed JSON or mismatched plan counts slip through silently.

-		resultsStruct := QueryPlanResults{}
-		json.Unmarshal(results, &resultsStruct)
-		resultsExpectedStruct := QueryPlanResults{}
-		json.Unmarshal(resultsExpected, &resultsExpectedStruct)
-		for i := range resultsStruct.Plans {
+		var resultsStruct QueryPlanResults
+		require.NoError(t, json.Unmarshal(results, &resultsStruct))
+		var resultsExpectedStruct QueryPlanResults
+		require.NoError(t, json.Unmarshal(resultsExpected, &resultsExpectedStruct))
+		require.Len(t, resultsStruct.Plans, len(resultsExpectedStruct.Plans))
+		for i := range resultsStruct.Plans {
 			assert.Equal(t, resultsStruct.Plans[i].Plan, resultsExpectedStruct.Plans[i].Plan)
 			assert.Equal(t, resultsStruct.Plans[i].Error, resultsExpectedStruct.Plans[i].Error)
 			assert.Equal(t, resultsStruct.Plans[i].Warning, resultsExpectedStruct.Plans[i].Warning)
 		}

(Use require here since malformed JSON should fail the test immediately.)


253-261: Apply the same JSON error/length checks in warning/report test

Same pattern as above: unmarshal errors and plan count mismatches should fail the test, not silently pass.

-		resultsStruct := QueryPlanResults{}
-		json.Unmarshal(results, &resultsStruct)
-		resultsExpectedStruct := QueryPlanResults{}
-		json.Unmarshal(resultsExpected, &resultsExpectedStruct)
-		for i := range resultsStruct.Plans {
+		var resultsStruct QueryPlanResults
+		require.NoError(t, json.Unmarshal(results, &resultsStruct))
+		var resultsExpectedStruct QueryPlanResults
+		require.NoError(t, json.Unmarshal(resultsExpected, &resultsExpectedStruct))
+		require.Len(t, resultsStruct.Plans, len(resultsExpectedStruct.Plans))
+		for i := range resultsStruct.Plans {
 			assert.Equal(t, resultsStruct.Plans[i].Plan, resultsExpectedStruct.Plans[i].Plan)
 			assert.Equal(t, resultsStruct.Plans[i].Error, resultsExpectedStruct.Plans[i].Error)
 			assert.Equal(t, resultsStruct.Plans[i].Warning, resultsExpectedStruct.Plans[i].Warning)
 		}

290-298: Also harden per-file JSON comparison in OutputFiles + warnings test

Same concern: unmarshal errors are ignored, and slice lengths are not checked. Tighten this to avoid tests silently succeeding on malformed output.

-				resultsStruct := QueryPlanResults{}
-				json.Unmarshal(queryPlan, &resultsStruct)
-				resultsExpectedStruct := QueryPlanResults{}
-				json.Unmarshal(expected, &resultsExpectedStruct)
-				for i := range resultsStruct.Plans {
+				var resultsStruct QueryPlanResults
+				require.NoError(t, json.Unmarshal(queryPlan, &resultsStruct))
+				var resultsExpectedStruct QueryPlanResults
+				require.NoError(t, json.Unmarshal(expected, &resultsExpectedStruct))
+				require.Len(t, resultsStruct.Plans, len(resultsExpectedStruct.Plans))
+				for i := range resultsStruct.Plans {
 					assert.Equal(t, resultsStruct.Plans[i].Plan, resultsExpectedStruct.Plans[i].Plan)
 					assert.Equal(t, resultsStruct.Plans[i].Error, resultsExpectedStruct.Plans[i].Error)
 					assert.Equal(t, resultsStruct.Plans[i].Warning, resultsExpectedStruct.Plans[i].Warning)
 				}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac085ab and 0e94185.

📒 Files selected for processing (5)
  • router/core/plan_generator.go (5 hunks)
  • router/core/plan_generator_test.go (1 hunks)
  • router/internal/planningbenchmark/benchmark_test.go (2 hunks)
  • router/pkg/plan_generator/plan_generator.go (2 hunks)
  • router/pkg/plan_generator/plan_generator_test.go (4 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/pkg/plan_generator/plan_generator.go
  • router/internal/planningbenchmark/benchmark_test.go
  • router/core/plan_generator.go
  • router/pkg/plan_generator/plan_generator_test.go
  • router/core/plan_generator_test.go
📚 Learning: 2025-08-14T16:47:03.744Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2137
File: router/pkg/pubsub/redis/adapter.go:114-116
Timestamp: 2025-08-14T16:47:03.744Z
Learning: In the Cosmo router codebase, EventMetricStore fields are never nil - the system uses the null object pattern with NewNoopEventMetricStore() when metrics are disabled, eliminating the need for nil checks before calling EventMetricStore methods.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router/core/plan_generator.go
  • router/pkg/plan_generator/plan_generator_test.go
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.

Applied to files:

  • router/pkg/plan_generator/plan_generator_test.go
🧬 Code graph analysis (2)
router/pkg/plan_generator/plan_generator.go (1)
router/core/plan_generator.go (1)
  • OperationTimes (57-63)
router/pkg/plan_generator/plan_generator_test.go (1)
router/pkg/plan_generator/plan_generator.go (1)
  • QueryPlanResults (39-42)
🔇 Additional comments (8)
router/core/plan_generator.go (1)

205-241: PlanPreparedOperation timing layout looks consistent

The separation between PlanTime (planner call) and PostProcessTime is clear, and error/ panic paths correctly propagate any accumulated opTimes without panicking the caller. This aligns well with the timing model introduced in OperationTimes.

router/pkg/plan_generator/plan_generator_test.go (2)

13-14: Importing core for PlanOutputFormat is appropriate

Using core.PlanOutputFormat* in tests keeps them aligned with the core API and avoids duplicating constants.


341-373: OutputFormat tests correctly distinguish JSON vs text output

Using core.PlanOutputFormatJSON and core.PlanOutputFormatText in these tests verifies both raw JSON mode and textual plan mode:

  • JSON mode: content is either JSON or a Warning: prefix.
  • Text mode: files keep .graphql suffix and content is non‑JSON textual plans or warnings.

This matches the intended semantics of the new OutputFormat parameter.

Also applies to: 375-405

router/core/plan_generator_test.go (1)

31-33: Test correctly updated for new PlanPreparedOperation signature

Capturing the extra OperationTimes return value with _ and continuing to assert on err keeps the panic‑handling behavior test intact while aligning with the new (planNode, OperationTimes, error) signature.

router/internal/planningbenchmark/benchmark_test.go (2)

39-45: Benchmark test adapts correctly to new ParseAndPrepareOperation/PlanPreparedOperation signatures

Using _ to ignore the new OperationTimes while preserving existing error checks and timing log (time.Since(start)) is correct and non‑breaking.


70-79: Benchmark loop correctly ignores timings while using new signatures

The benchmark now calls ParseAndPrepareOperation and PlanPreparedOperation with the additional return value ignored. The core benchmarking logic (alloc reporting, bytes measurement, error checking) remains intact.

router/pkg/plan_generator/plan_generator.go (2)

45-50: Adding Timings to QueryPlanResult is a clean extension

Adding Timings core.OperationTimes with omitempty cleanly exposes per‑operation timings to consumers of the JSON report without breaking existing fields. This matches the PR’s stated objective.


149-154: Correctly propagating opTimes into QueryPlanResult

Capturing opTimes from planner.PlanOperation and assigning it to QueryPlanResult.Timings ensures timing data is available regardless of whether the plan is written to individual files or the aggregate report. Once PlanOperation is adjusted to preserve timings on validation errors, this will give complete coverage of operation timings for both success and warning/error entries.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
router/core/plan_generator.go (1)

107-110: Fix PlanTime double counting.

The outer timing wrapper at lines 107-110 overwrites PlanTime with the total duration of PlanPreparedOperation, which already measured PlanTime and PostProcessTime separately. This causes TotalTime() to double-count PostProcessTime.

Remove the outer timing wrapper and rely solely on the timing measurements inside PlanPreparedOperation.

Apply this diff:

-	start := time.Now()
 	rawPlan, opTimes2, err := pl.PlanPreparedOperation(operation)
 	opTimes = opTimes.Merge(opTimes2)
-	opTimes.PlanTime = time.Since(start)
 	if err != nil {
 		return "", opTimes, fmt.Errorf("failed to plan operation: %w", err)
 	}
🧹 Nitpick comments (1)
router/core/plan_generator.go (1)

65-67: Consider value receiver for consistency.

TotalTime() uses a pointer receiver but doesn't mutate state. For consistency with the value receiver used in Merge() and to avoid unnecessary pointer dereferences, consider using a value receiver.

Apply this diff:

-func (ot *OperationTimes) TotalTime() time.Duration {
+func (ot OperationTimes) TotalTime() time.Duration {
 	return ot.ParseTime + ot.NormalizeTime + ot.ValidateTime + ot.PostProcessTime + ot.PlanTime
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e94185 and 74a46ae.

📒 Files selected for processing (2)
  • router/core/plan_generator.go (5 hunks)
  • router/pkg/plan_generator/plan_generator_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/pkg/plan_generator/plan_generator_test.go
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-08-14T16:47:03.744Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2137
File: router/pkg/pubsub/redis/adapter.go:114-116
Timestamp: 2025-08-14T16:47:03.744Z
Learning: In the Cosmo router codebase, EventMetricStore fields are never nil - the system uses the null object pattern with NewNoopEventMetricStore() when metrics are disabled, eliminating the need for nil checks before calling EventMetricStore methods.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router/core/plan_generator.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
router/core/plan_generator.go (3)

130-145: LGTM! Timing preserved on error paths.

The implementation correctly measures parse time and preserves collected timings on both parse and prepare errors, which enables tracking even for operations that fail validation.


148-171: LGTM! Clean phase-by-phase timing.

The function correctly measures normalization and validation times separately, and preserves accumulated timings on error paths. The early return for missing operation names appropriately returns an empty OperationTimes since no work has been performed yet.


205-241: LGTM! Accurate plan and post-process timing.

The function correctly measures planning and post-processing as separate phases and preserves timing information through all error and success paths. The distinction between PlanTime and PostProcessTime enables fine-grained performance analysis.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/plan_generator.go (1)

57-77: LGTM! Clean timing instrumentation design.

The OperationTimes struct and its helper methods are well-structured for capturing and aggregating per-phase durations. The Merge() method correctly combines non-overlapping timings from different pipeline stages, and TotalTime() provides a convenient aggregate.

Optionally, consider adding doc comments to the exported type and methods for discoverability:

+// OperationTimes captures timing measurements for each phase of GraphQL operation planning.
 type OperationTimes struct {
 	ParseTime       time.Duration
 	NormalizeTime   time.Duration
 	ValidateTime    time.Duration
 	PostProcessTime time.Duration
 	PlanTime        time.Duration
 }

+// TotalTime returns the sum of all recorded phase durations.
 func (ot *OperationTimes) TotalTime() time.Duration {
 	return ot.ParseTime + ot.NormalizeTime + ot.ValidateTime + ot.PostProcessTime + ot.PlanTime
 }

+// Merge combines timing data from another OperationTimes by summing corresponding fields.
 func (ot OperationTimes) Merge(other OperationTimes) OperationTimes {
 	return OperationTimes{
 		ParseTime:       ot.ParseTime + other.ParseTime,
 		NormalizeTime:   ot.NormalizeTime + other.NormalizeTime,
 		ValidateTime:    ot.ValidateTime + other.ValidateTime,
 		PostProcessTime: ot.PostProcessTime + other.PostProcessTime,
 		PlanTime:        ot.PlanTime + other.PlanTime,
 	}
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74a46ae and e178089.

📒 Files selected for processing (1)
  • router/core/plan_generator.go (5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-08-14T16:47:03.744Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2137
File: router/pkg/pubsub/redis/adapter.go:114-116
Timestamp: 2025-08-14T16:47:03.744Z
Learning: In the Cosmo router codebase, EventMetricStore fields are never nil - the system uses the null object pattern with NewNoopEventMetricStore() when metrics are disabled, eliminating the need for nil checks before calling EventMetricStore methods.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router/core/plan_generator.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
router/core/plan_generator.go (4)

101-125: LGTM! Timing aggregation is correct.

The function properly merges timings from the parse/prepare and plan phases, and all error paths now preserve the accumulated opTimes. The previous double-counting issue (outer wrapper overwriting PlanTime) has been correctly resolved.


128-143: LGTM! Parse timing correctly captured and propagated.

The function accurately measures parse time and merges it with downstream normalization and validation timings from PrepareOperation. Both error paths correctly preserve the timing data collected up to the point of failure.


146-169: LGTM! Normalization and validation timing instrumentation is sound.

The function correctly measures and records both normalization and validation durations, and all error paths preserve the accumulated timing state. The early return for a missing operation name appropriately returns an empty OperationTimes{} since no timed work has occurred.


203-239: LGTM! Plan and postprocess timings are accurately instrumented.

The function correctly measures planning and post-processing as separate phases and returns the accumulated opTimes on all code paths, including errors. The timing measurements are appropriately placed to capture the actual work performed by the planner and postprocessor.

@github-actions github-actions bot removed the Stale label Nov 15, 2025
Copy link
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
router/core/plan_generator.go (2)

10-75: OperationTimes layout and helpers look good; consider value receiver for TotalTime

The OperationTimes struct and Merge helper provide a clear, non-allocating way to aggregate phase timings and match the intended parse/normalize/validate/plan breakdown. One small polish: TotalTime doesn’t mutate the receiver, so you could switch it to a value receiver for consistency with Merge and to avoid any (unlikely) nil-pointer surprises:

-func (ot *OperationTimes) TotalTime() time.Duration {
+func (ot OperationTimes) TotalTime() time.Duration {
 	return ot.ParseTime + ot.NormalizeTime + ot.ValidateTime + ot.PlanTime
}

This is purely stylistic; current behavior is correct.


201-237: PlanPreparedOperation timing behavior is correct; minor readability tweak possible

PlanPreparedOperation now:

  • Returns opTimes on all paths (including operation name not found and report.HasErrors()).
  • Measures PlanTime for planner-only failures (assignment before the HasErrors check).
  • On success, includes postprocessing in PlanTime by measuring from the same start through post.Process, without any outer wrapper in PlanOperation, so there’s no double-counting.

Functionally this is correct and matches the comment about including postprocessing in planning time. If you want to make the control flow slightly clearer, you could assign PlanTime only in the error and success branches, e.g.:

-	start := time.Now()
-	preparedPlan := pl.planner.Plan(operation, pl.definition, string(operationName), &report, plan.IncludeQueryPlanInResponse())
-	opTimes.PlanTime = time.Since(start)
-	if report.HasErrors() {
-		return nil, opTimes, errors.New(report.Error())
-	}
-
-	post := postprocess.NewProcessor()
-	post.Process(preparedPlan)
-	// measure postprocessing time as part of planning time
-	opTimes.PlanTime = time.Since(start)
+	start := time.Now()
+	preparedPlan := pl.planner.Plan(operation, pl.definition, string(operationName), &report, plan.IncludeQueryPlanInResponse())
+	if report.HasErrors() {
+		opTimes.PlanTime = time.Since(start)
+		return nil, opTimes, errors.New(report.Error())
+	}
+
+	post := postprocess.NewProcessor()
+	post.Process(preparedPlan)
+	// measure postprocessing time as part of planning time
+	opTimes.PlanTime = time.Since(start)

But this is a clarity improvement only; the current implementation is logically correct.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e178089 and bc29ef7.

📒 Files selected for processing (1)
  • router/core/plan_generator.go (5 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-08-14T16:47:03.744Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2137
File: router/pkg/pubsub/redis/adapter.go:114-116
Timestamp: 2025-08-14T16:47:03.744Z
Learning: In the Cosmo router codebase, EventMetricStore fields are never nil - the system uses the null object pattern with NewNoopEventMetricStore() when metrics are disabled, eliminating the need for nil checks before calling EventMetricStore methods.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/core/plan_generator.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router/core/plan_generator.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: check-labels-and-engine-changes
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
router/core/plan_generator.go (3)

98-123: PlanOperation now correctly preserves and returns accumulated timings

The updated PlanOperation cleanly threads OperationTimes through parse/prepare and plan phases, and importantly returns the collected opTimes on all error paths (including parse/validation failures and marshal errors). This aligns with the goal of exposing timings even when planning fails and avoids the earlier double-counting issue by relying solely on PlanPreparedOperation for plan timing.


125-141: ParseAndPrepareOperation timing and error propagation are sound

ParseAndPrepareOperation now measures parse time once around parseOperation, attaches it to the returned opTimes (including in error cases), and delegates normalization/validation timing to PrepareOperation. On validation-like errors wrapped in PlannerOperationValidationError, the caller still receives accurate ParseTime plus any normalization/validation timings, which is exactly what you want for regression tracking.


144-167: PrepareOperation correctly captures normalize/validate timings and wraps validation errors

The normalization and validation steps are now individually timed and stored in NormalizeTime and ValidateTime, with errors consistently wrapped in PlannerOperationValidationError. When called via ParseAndPrepareOperation, the parse duration is then injected on top, yielding a complete timing picture for validation failures. Behavior for direct callers of PrepareOperation remains intuitive (no ParseTime populated).

@alepane21 alepane21 merged commit d792fe5 into main Nov 19, 2025
28 checks passed
@alepane21 alepane21 deleted the ale/eng-8412-make-query-planner-regression-ci-aware-of-query-plan-timings branch November 19, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants