feat: retrieve operation body from gqlmetrics instead of traces #2235

endigma · 2025-09-24T15:28:31Z

Summary by CodeRabbit

New Features
- Added a tracing option to include original and normalized GraphQL operation content as span attributes (disabled by default; configurable via telemetry/tracing). New attribute keys exposed for richer trace metadata.
Studio
- Trace Details: separately fetches operation content, adds dedicated loading/error states, and shows side-by-side viewers for operation and variables with a “Play in Playground” link and validation/truncation handling.
Tests
- Added telemetry tests validating presence/absence of operation content attributes when the new tracing flag is toggled.
Chores
- Collector pipeline updated to strip operation body attributes by default; config/schema and defaults updated to surface the new flag.

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.
Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
I have read the Contributors Guide.

coderabbitai · 2025-09-24T15:28:39Z

Walkthrough

Adds an OperationContentAttributes flag to control emitting GraphQL operation original/normalized body content into traces; wires the flag through config, trace config, router, GraphQL prehandler, and tests; introduces new OTEL attribute keys; updates Studio to fetch/display operation content; adds an OTEL processor to delete those attributes; consolidates a null/empty check.

Changes

Cohort / File(s)	Summary
Tracing config plumbing `router/pkg/config/config.go`, `router/pkg/config/config.schema.json`, `router/pkg/trace/config.go`, `router/core/router.go`, `router/core/graph_server.go`, `router/core/graphql_prehandler.go`, `router-tests/testenv/testenv.go`, `router/pkg/config/testdata/config_defaults.json`, `router/pkg/config/testdata/config_full.json`	Add `OperationContentAttributes` to config structs/schema/defaults and to trace `Config`; propagate the flag into router trace config, GraphQL prehandler options, and test env; PreHandler gains a flag to conditionally emit operation original/normalized content into parse/normalize spans.
Telemetry attribute keys `router/pkg/otel/attributes.go`	Remove `WgOperationContent`; add `WgOperationOriginalContent`, `WgOperationNormalizedContent` and numerous new attribute keys (engine cache/hits, router/span/cluster, subgraph errors, feature flags, resolver wait times, normalization/validation cache hits, complexity metrics, batching, HTTP upload file count, etc.).
Telemetry tests `router-tests/telemetry/telemetry_test.go`	Add `TestOperationBodyAttributes` and extend tests to toggle `OperationContentAttributes`; assert presence/absence and exact values of original/normalized operation content attributes depending on configuration.
Studio trace details UI `studio/src/components/analytics/trace-details.tsx`	Introduce separate fetch for operation content by operationHash; add loading/error/retry states; validate/pretty-print operation and variables; conditionally render code viewers, playground link, and EmptyState on failure.
OTEL collector config `otelcollector/otel-config.yaml`	Add `attributes/delete_operation_bodies` processor to delete `wg.operation.normalized_content` and `wg.operation.original_content`, and insert it into the traces pipeline.
Controlplane analytics util `controlplane/src/core/bufservices/analytics/getOperationContent.ts`	Consolidate null/empty-result checks into a single guard (`!Array.isArray(result)
Dev/debug config `router/debug.config.yaml`	Enable tracing and set `operation_content_attributes: true` in debug config; add exporter endpoints and timeouts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(router): block/disable persisted operations #2181 — touches router/core/graphql_prehandler.go and handleOperation logic; likely overlaps with prehandler changes adding operation-body tracing attributes.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title clearly and concisely captures the primary change, which is shifting the retrieval of GraphQL operation bodies from trace data to the gqlmetrics service, and it omits unnecessary detail while remaining specific.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 725fcec and ff3b808.

📒 Files selected for processing (15)

controlplane/src/core/bufservices/analytics/getOperationContent.ts (1 hunks)
otelcollector/otel-config.yaml (2 hunks)
router-tests/telemetry/telemetry_test.go (28 hunks)
router-tests/testenv/testenv.go (2 hunks)
router/core/graph_server.go (1 hunks)
router/core/graphql_prehandler.go (6 hunks)
router/core/router.go (1 hunks)
router/debug.config.yaml (1 hunks)
router/pkg/config/config.go (1 hunks)
router/pkg/config/config.schema.json (1 hunks)
router/pkg/config/testdata/config_defaults.json (1 hunks)
router/pkg/config/testdata/config_full.json (1 hunks)
router/pkg/otel/attributes.go (1 hunks)
router/pkg/trace/config.go (3 hunks)
studio/src/components/analytics/trace-details.tsx (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (9)

router/pkg/config/config.go
router-tests/testenv/testenv.go
router/debug.config.yaml
router/pkg/config/testdata/config_full.json
router/pkg/config/config.schema.json
controlplane/src/core/bufservices/analytics/getOperationContent.ts
router/core/graph_server.go
otelcollector/otel-config.yaml
router/core/router.go

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: endigma
PR: wundergraph/cosmo#2235
File: otelcollector/otel-config.yaml:22-27
Timestamp: 2025-09-29T11:31:58.714Z
Learning: The otelcollector/otel-config.yaml configuration is specifically for routing telemetry to the studio, which doesn't display operation body attributes. The attributes/delete_operation_bodies processor intentionally removes wg.operation.normalized_content and wg.operation.original_content to optimize storage since the studio doesn't use them. Users wanting to preserve operation body attributes should configure tracing to send directly to external systems like Tempo or Jaeger.

📚 Learning: 2025-10-01T20:39:16.113Z

Learnt from: SkArchon
PR: wundergraph/cosmo#2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

router-tests/telemetry/telemetry_test.go

🧬 Code graph analysis (4)

router/pkg/trace/config.go (1)

router/pkg/config/config.go (1)

CustomAttribute (43-47)

router/core/graphql_prehandler.go (1)

router/pkg/otel/attributes.go (2)

WgOperationOriginalContent (13-13)

WgOperationNormalizedContent (12-12)

router-tests/telemetry/telemetry_test.go (4)

router/pkg/otel/attributes.go (2)

WgOperationOriginalContent (13-13)

WgOperationNormalizedContent (12-12)

router-tests/testenv/testenv.go (1)

Config (286-343)

router/pkg/trace/tracetest/tracetest.go (1)

NewInMemoryExporter (11-17)

router/pkg/trace/config.go (1)

Config (51-74)

studio/src/components/analytics/trace-details.tsx (6)

studio/src/components/layout/graph-layout.tsx (1)

GraphContext (55-57)

controlplane/src/core/bufservices/analytics/getOperationContent.ts (1)

getOperationContent (14-74)

connect/src/wg/cosmo/platform/v1/platform-PlatformService_connectquery.ts (1)

getOperationContent (918-927)

studio/src/lib/constants.ts (1)

docsBaseURL (1-1)

studio/src/components/code-viewer.tsx (1)

CodeViewer (82-176)

studio/src/lib/schema-helpers.ts (1)

extractVariablesFromGraphQL (162-205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: build_test
GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
GitHub Check: integration_test (./events)
GitHub Check: integration_test (./telemetry)
GitHub Check: build_test
GitHub Check: Analyze (go)

🔇 Additional comments (14)

studio/src/components/analytics/trace-details.tsx (8)

1-25: LGTM!

The imports are well-organized and support the dual-source data fetching pattern (trace + operation content), along with GraphQL/JSON formatting via prettier plugins.

27-40: LGTM!

Component setup correctly extracts query parameters and initializes state with appropriate defaults (empty strings with single quotes).

41-56: LGTM!

Trace data fetching is properly configured with a 10-second polling interval and correct parameter passing.

58-59: LGTM!

Operation hash derivation correctly finds the first span with an operationHash attribute. The result properly handles the case where no hash exists (undefined), which is guarded by the enabled condition in the subsequent query.

61-76: LGTM!

Operation content fetching is correctly guarded by the presence of operationHash (via enabled: !!operationHash). This aligns with the PR's objective to retrieve operation body from gqlmetrics instead of traces.

78-124: LGTM!

The effect correctly processes content and variables only when both data sources are available. The validity check using prettier formatting is a good approach to detect truncation.

Note: The code assumes operationContentData.operationContent is always defined (based on the controlplane implementation which returns an empty string for errors), which is safe.

126-138: LGTM!

Loading and error handling for trace data is properly implemented with a full-screen loader and a comprehensive error state that includes retry functionality.

140-234: No issues with playground parameter handling

Operation and variables are properly encoded on link creation and decoded before use; they’re only passed into the GraphiQL editor and fetcher, eliminating XSS or injection risks.

router/core/graphql_prehandler.go (5)

72-74: LGTM!

The new OperationContentAttributes field is properly added to the options struct and follows the existing pattern for boolean configuration flags.

155-157: LGTM!

The field is correctly wired from options to the handler instance in the constructor, maintaining consistency with other configuration flags.

529-548: LGTM! Switch statement refactor improves code structure.

The switch statement maintains identical behavior to the previous if/else logic while providing better structure for potential future HTTP method handling. The change only affects GET and POST methods, which are the only methods expected at this stage (as confirmed by the past review comment).

640-643: LGTM! Original operation content correctly set on parse span.

The original operation content is properly gated behind the operationContentAttributes flag and set on the parse span using the correct OTEL attribute key. The placement is appropriate—immediately after starting the parse span and before parsing begins.

905-908: LGTM! Normalized operation content correctly set on normalize span.

The normalized operation content is properly gated behind the operationContentAttributes flag and set on the normalize span using the correct OTEL attribute key. The placement is appropriate—after normalization completes and the normalized representation is available.

router/pkg/otel/attributes.go (1)

10-13: LGTM! Semantic improvement over single content attribute.

The split from WgOperationContent into WgOperationNormalizedContent and WgOperationOriginalContent clearly distinguishes between the normalized GraphQL operation and the original operation received. The naming follows OpenTelemetry conventions (snake_case, wg. namespace prefix).

Based on learnings: The otelcollector configuration intentionally removes these attributes to optimize storage since studio doesn't display them. Users wanting to preserve operation body attributes should configure tracing to send directly to external systems like Tempo or Jaeger.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-09-24T15:31:00Z

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-3def83d9356dad5e3caa98b416f31a933728fd9c

studio/src/components/analytics/trace-details.tsx

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

router/core/graphql_prehandler.go (2)

529-548: Missing default in method switch allows unsupported HTTP methods to slip through

Requests with methods other than GET/POST won’t be unmarshaled and may fail later in confusing ways. Return 405 early for unsupported methods.

Apply this diff:

 switch req.Method {
 case http.MethodGet:
   if err := operationKit.UnmarshalOperationFromURL(req.URL); err != nil {
     return &httpGraphqlError{
       message:    fmt.Sprintf("invalid GET request: %s", err),
       statusCode: http.StatusBadRequest,
     }
   }
 case http.MethodPost:
   if err := operationKit.UnmarshalOperationFromBody(httpOperation.body); err != nil {
     return &httpGraphqlError{
       message:    fmt.Sprintf("invalid request body: %s", err),
       statusCode: http.StatusBadRequest,
     }
   }
   // If we have files, we need to set them on the parsed operation
   if len(httpOperation.files) > 0 {
     requestContext.operation.files = httpOperation.files
   }
+default:
+  return &httpGraphqlError{
+    message:    fmt.Sprintf("method %s not allowed, only GET and POST are supported", req.Method),
+    statusCode: http.StatusMethodNotAllowed,
+  }
 }

910-914: Bug: setting attributes after span.End()

engineNormalizeSpan.SetAttributes(...) is called after engineNormalizeSpan.End(), which is a no-op or error in most SDKs. Move attribute setting before End() (or drop it as it’s already set on the “Load Persisted Operation” span).

Apply this diff:

-    engineNormalizeSpan.End()
-
-    if operationKit.parsedOperation.IsPersistedOperation {
-        engineNormalizeSpan.SetAttributes(otel.WgEnginePersistedOperationCacheHit.Bool(operationKit.parsedOperation.PersistedOperationCacheHit))
-    }
+    if operationKit.parsedOperation.IsPersistedOperation {
+        engineNormalizeSpan.SetAttributes(otel.WgEnginePersistedOperationCacheHit.Bool(operationKit.parsedOperation.PersistedOperationCacheHit))
+    }
+    engineNormalizeSpan.End()

🧹 Nitpick comments (2)

router/core/graphql_prehandler.go (2)
640-644: Consider truncating large operation bodies before setting as span attributes

OTel exporters often drop or cap oversized attribute values. Truncate to a safe bound (e.g., 16KB) to reduce drop risk and memory churn.

Example change:
-    engineParseSpan.SetAttributes(otel.WgOperationOriginalContent.String(operationKit.parsedOperation.Request.Query))
+    engineParseSpan.SetAttributes(otel.WgOperationOriginalContent.String(truncateAttrValue(operationKit.parsedOperation.Request.Query, 16*1024)))
Helper (place near the bottom of this file):
// truncateAttrValue returns s truncated to at most limit bytes.
// If limit <= 0 or s is shorter, it returns s unchanged.
func truncateAttrValue(s string, limit int) string {
	if limit <= 0 || len(s) <= limit {
		return s
	}
	return s[:limit]
}
905-909: Apply the same truncation for normalized content attribute

Normalized queries can also be large; apply the same bound.
-    engineNormalizeSpan.SetAttributes(otel.WgOperationNormalizedContent.String(operationKit.parsedOperation.NormalizedRepresentation))
+    engineNormalizeSpan.SetAttributes(otel.WgOperationNormalizedContent.String(truncateAttrValue(operationKit.parsedOperation.NormalizedRepresentation, 16*1024)))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 146f4ac and ba3d95e.

📒 Files selected for processing (11)

controlplane/src/core/bufservices/analytics/getOperationContent.ts (1 hunks)
router-tests/telemetry/telemetry_test.go (27 hunks)
router-tests/testenv/testenv.go (2 hunks)
router/core/graph_server.go (1 hunks)
router/core/graphql_prehandler.go (6 hunks)
router/core/router.go (1 hunks)
router/pkg/config/config.go (1 hunks)
router/pkg/config/config.schema.json (1 hunks)
router/pkg/otel/attributes.go (1 hunks)
router/pkg/trace/config.go (3 hunks)
studio/src/components/analytics/trace-details.tsx (5 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-02T12:52:27.677Z

Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

router/core/graphql_prehandler.go
router/core/graph_server.go

🧬 Code graph analysis (6)

router/core/router.go (1)

router/pkg/config/config.go (2)

ResponseTraceHeader (68-71)

Tracing (73-85)

router-tests/testenv/testenv.go (1)

router/pkg/config/config.go (4)

TracingExporter (59-66)

PropagationConfig (87-93)

TracingGlobalFeatures (54-57)

ResponseTraceHeader (68-71)

router/core/graphql_prehandler.go (1)

router/pkg/otel/attributes.go (2)

WgOperationOriginalContent (13-13)

WgOperationNormalizedContent (12-12)

studio/src/components/analytics/trace-details.tsx (5)

studio/src/components/layout/graph-layout.tsx (1)

GraphContext (55-57)

controlplane/src/core/bufservices/analytics/getOperationContent.ts (1)

getOperationContent (14-74)

connect/src/wg/cosmo/platform/v1/platform-PlatformService_connectquery.ts (1)

getOperationContent (918-927)

studio/src/lib/constants.ts (1)

docsBaseURL (1-1)

studio/src/lib/schema-helpers.ts (1)

extractVariablesFromGraphQL (162-205)

router/pkg/trace/config.go (1)

router/pkg/config/config.go (1)

CustomAttribute (43-47)

router-tests/telemetry/telemetry_test.go (5)

router/pkg/otel/attributes.go (9)

WgRouterVersion (20-20)

WgFederatedGraphID (22-22)

WgRouterConfigVersion (21-21)

WgClientName (18-18)

WgClientVersion (19-19)

WgOperationProtocol (16-16)

WgOperationOriginalContent (13-13)

WgOperationNormalizedContent (12-12)

WgFeatureFlag (36-36)

router-tests/testenv/testenv.go (4)

Run (107-124)

Config (286-343)

Environment (1731-1767)

GraphQLRequest (1907-1915)

router/pkg/trace/config.go (1)

Config (51-74)

router/pkg/trace/tracetest/tracetest.go (1)

NewInMemoryExporter (11-17)

router/core/operation_processor.go (1)

GraphQLRequest (181-186)

🪛 GitHub Check: CodeQL

studio/src/components/analytics/trace-details.tsx

[warning] 195-199: Client-side URL redirect
Untrusted URL redirection depends on a user-provided value.

[warning] 213-217: Client-side URL redirect
Untrusted URL redirection depends on a user-provided value.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: build_test
GitHub Check: integration_test (./telemetry)
GitHub Check: integration_test (./events)
GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
GitHub Check: build_test
GitHub Check: build_push_image
GitHub Check: Analyze (go)

🔇 Additional comments (7)

controlplane/src/core/bufservices/analytics/getOperationContent.ts (1)

57-64: Consolidated guard looks solid.

The unified array/length check keeps the control flow tight and protects the later result[0] access without changing semantics. Nicely done.
studio/src/components/analytics/trace-details.tsx (2)
143-158: Avoid showing error UI before we can fetch operation content

When operationHash is missing we never trigger getOperationContent, yet this block still renders the “Could not retrieve operation contents” error because we treat undefined !== EnumStatusCode.OK as failure. For many traces (older router versions, disabled body attributes, subgraph spans, etc.) this will surface an incorrect error message to users. Please guard the entire section behind operationHash (and only evaluate the status once data exists) so the UI stays quiet when the hash is unavailable instead of showing a false error.
-      {operationContentIsFetching ? (
+      {operationHash ? (
+        operationContentIsFetching ? (
           <div className="mt-4 flex h-72 items-center justify-center rounded-md border">
             <Loader />
           </div>
-      ) : (
-        <>
-          {operationContentError || operationContentData?.response?.code !== EnumStatusCode.OK ? (
+        ) : (
+          <>
+            {operationContentError ||
+            (operationContentData && operationContentData.response?.code !== EnumStatusCode.OK) ? (
               <EmptyState
                 className="order-2 mt-4 h-72 border lg:order-last"
                 icon={<ExclamationTriangleIcon />}
                 title="Could not retrieve operation contents"
                 description={
                   operationContentData?.response?.details || operationContentError?.message || 'Please try again'
                 }
                 actions={<Button onClick={() => operationContentRefetch()}>Retry</Button>}
               />
-          ) : (
-            <>
+            ) : (
+              <>
                 …
-            </>
-          )}
-        </>
-      )}
+              </>
+            )}
+          </>
+        )
+      ) : null}
78-123: Reset truncation state when content changes

isTruncated flips to true once we detect invalid/truncated payloads, but we never reset it. After a retry that returns valid content the warning icon still shows, which is misleading. Reset the flag whenever we clear content, and set it based on the latest validation result so the UI reflects the current payload.
   useEffect(() => {
-    if (!traceData || !operationContentData) {
-      setVariables('');
-      setContent('');
-      return;
-    }
+    if (!traceData || !operationContentData) {
+      setVariables('');
+      setContent('');
+      setTruncated(false);
+      return;
+    }
 
     …
-    const content = operationContentData.operationContent;
-    setContent(content);
-
-    const routerSpan = traceData.spans.find((span) => !!span.attributes?.operationVariables);
-
-    if (routerSpan) {
-      const variables = routerSpan.attributes?.operationVariables;
-
-      setVariables(variables);
-
-      checkValidity(content, variables).then((isValid) => {
-        if (!isValid) {
-          setTruncated(true);
-        }
-      });
-    }
+    const content = operationContentData.operationContent;
+    setContent(content);
+
+    const routerSpan = traceData.spans.find((span) => !!span.attributes?.operationVariables);
+
+    setVariables('');
+    setTruncated(false);
+
+    if (routerSpan) {
+      const variables = routerSpan.attributes?.operationVariables ?? '';
+
+      setVariables(variables);
+
+      checkValidity(content, variables).then((isValid) => {
+        setTruncated(!isValid);
+      });
+    }
   }, [traceData, operationContentData]);
router-tests/telemetry/telemetry_test.go (1)

10471-10562: Great coverage on the new flag behavior. These subtests nicely prove the span attributes only appear when the feature is enabled and stay absent otherwise. Solid guard against regressions.

router/core/graphql_prehandler.go (3)

72-74: Config flag addition looks good

New PreHandlerOptions flag is clear and scoped. Ensure the public config/schema default remains false and docs mention privacy implications.

108-109: Internal flag wiring on PreHandler

Field addition is straightforward and matches options.

156-157: Constructor wiring LGTM

Option is correctly propagated into the handler.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba3d95e and 5e56476.

📒 Files selected for processing (1)

otelcollector/otel-config.yaml (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: build-router
GitHub Check: build_push_image
GitHub Check: build_push_image (nonroot)
GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
GitHub Check: image_scan (nonroot)
GitHub Check: integration_test (./telemetry)
GitHub Check: image_scan
GitHub Check: build_test
GitHub Check: Analyze (go)
GitHub Check: build_test
GitHub Check: build_push_image
GitHub Check: Analyze (javascript-typescript)
GitHub Check: build_test
GitHub Check: build_push_image
GitHub Check: build_push_image
GitHub Check: build_test

otelcollector/otel-config.yaml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

router-tests/telemetry/telemetry_test.go (4)
2989-2989: Avoid exact string matches for normalized operation content

Normalization formatting can legitimately change (whitespace, argument canonicalization). Prefer asserting “attribute present and non-empty” (or a minimal containment) instead of exact equality to reduce flakes.

Example change applied to the new test for illustration:
-			require.Equal(t, "Operation - Normalize", sn[2].Name())
-			require.Contains(t, sn[2].Attributes(), otel.WgOperationNormalizedContent.String("{employees {id}}"))
+			require.Equal(t, "Operation - Normalize", sn[2].Name())
+			found := false
+			for _, a := range sn[2].Attributes() {
+				if a.Key == otel.WgOperationNormalizedContent {
+					found = true
+					require.NotEmpty(t, a.Value.AsString(), "normalized content should not be empty when enabled")
+					break
+				}
+			}
+			require.True(t, found, "wg.operation.normalized_content attribute missing")
If desired, we can apply the same pattern to the other call sites listed above.

Also applies to: 3025-3025, 3747-3747, 4202-4202, 10498-10500

2980-2980: Reduce reliance on hard-coded attribute counts

Asserting exact counts for span attributes is brittle; new telemetry keys or upstream SDK changes will break tests without functional regressions. Assert presence/absence of the required keys instead (using attribute.NewSet.HasValue) and drop the strict Len checks where not essential.

Also applies to: 3013-3013, 3258-3258, 3702-3702, 3735-3735, 3980-3980, 4157-4157, 4190-4190, 4435-4435, 6347-6347, 6352-6352, 6383-6383, 7132-7132, 8806-8806, 8810-8810, 8837-8837

10521-10524: Use attribute sets for “absence” checks for clarity

Looping with NotEqual works, but attribute.NewSet.HasValue reads cleaner and avoids per-attribute assertions.
-			for _, attr := range sn[1].Attributes() {
-				require.NotEqual(t, otel.WgOperationOriginalContent, attr.Key, "WgOperationOriginalContent should not be present when EnableOperationBodyAttributes is false")
-			}
+			set := attribute.NewSet(sn[1].Attributes()...)
+			require.False(t, set.HasValue(otel.WgOperationOriginalContent), "WgOperationOriginalContent should not be present when EnableOperationBodyAttributes is false")
@@
-			for _, attr := range sn[2].Attributes() {
-				require.NotEqual(t, otel.WgOperationNormalizedContent, attr.Key, "WgOperationNormalizedContent should not be present when EnableOperationBodyAttributes is false")
-			}
+			set = attribute.NewSet(sn[2].Attributes()...)
+			require.False(t, set.HasValue(otel.WgOperationNormalizedContent), "WgOperationNormalizedContent should not be present when EnableOperationBodyAttributes is false")
@@
-			for _, attr := range sn[1].Attributes() {
-				require.NotEqual(t, otel.WgOperationOriginalContent, attr.Key, "WgOperationOriginalContent should not be present by default")
-			}
+			set = attribute.NewSet(sn[1].Attributes()...)
+			require.False(t, set.HasValue(otel.WgOperationOriginalContent), "WgOperationOriginalContent should not be present by default")
@@
-			for _, attr := range sn[2].Attributes() {
-				require.NotEqual(t, otel.WgOperationNormalizedContent, attr.Key, "WgOperationNormalizedContent should not be present by default")
-			}
+			set = attribute.NewSet(sn[2].Attributes()...)
+			require.False(t, set.HasValue(otel.WgOperationNormalizedContent), "WgOperationNormalizedContent should not be present by default")
Also applies to: 10527-10530, 10553-10555, 10558-10560

10471-10563: New TestOperationBodyAttributes suite: good coverage; minor robustness tweaks advisable

Great triad (enabled/disabled/default). Applying the earlier suggestions (drop exact normalized string equality, prefer HasValue for absence) will make this suite resilient.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7786146 and 4d60e0d.

📒 Files selected for processing (2)

router-tests/telemetry/telemetry_test.go (28 hunks)
router/debug.config.yaml (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

router-tests/telemetry/telemetry_test.go (4)

router/pkg/otel/attributes.go (2)

WgOperationOriginalContent (13-13)

WgOperationNormalizedContent (12-12)

router-tests/testenv/testenv.go (2)

Config (286-343)

GraphQLRequest (1907-1915)

router/pkg/trace/tracetest/tracetest.go (1)

NewInMemoryExporter (11-17)

router/pkg/trace/config.go (1)

Config (51-74)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: build_push_image
GitHub Check: build_test
GitHub Check: build_push_image
GitHub Check: image_scan
GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
GitHub Check: build_test
GitHub Check: image_scan (nonroot)
GitHub Check: integration_test (./events)
GitHub Check: build_push_image
GitHub Check: build_push_image
GitHub Check: integration_test (./telemetry)
GitHub Check: build_push_image (nonroot)
GitHub Check: build_test
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (go)

🔇 Additional comments (2)

router-tests/telemetry/telemetry_test.go (2)

2927-2929: Good wiring: gated operation body attributes covered across paths

Using EnableOperationBodyAttributes in representative test suites is clear and exercises both presence/absence paths. LGTM.

Also applies to: 3636-3639, 4043-4046, 6327-6330, 7115-7117, 8764-8767

9967-9967: Helpful assertion message

Adding context to require.NoError improves failure diagnostics. LGTM.

Also applies to: 10059-10059

router/core/graphql_prehandler.go

router/pkg/config/config.go

router/debug.config.yaml

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

router/core/graphql_prehandler.go (1)
529-548: Handle non-GET/POST methods explicitly.
Without a default branch, HEAD/PUT/etc. skip both unmarshalling paths, leaving the kit uninitialized until we later parse/normalize, which surfaces confusing “error parsing variables” responses. Short-circuit with a 405 so unsupported verbs are rejected predictably and in line with GraphQL-over-HTTP (GET/POST only).
 	switch req.Method {
 	case http.MethodGet:
 		if err := operationKit.UnmarshalOperationFromURL(req.URL); err != nil {
 			return &httpGraphqlError{
 				message:    fmt.Sprintf("invalid GET request: %s", err),
 				statusCode: http.StatusBadRequest,
 			}
 		}
 	case http.MethodPost:
 		if err := operationKit.UnmarshalOperationFromBody(httpOperation.body); err != nil {
 			return &httpGraphqlError{
 				message:    fmt.Sprintf("invalid request body: %s", err),
 				statusCode: http.StatusBadRequest,
 			}
 		}
 		// If we have files, we need to set them on the parsed operation
 		if len(httpOperation.files) > 0 {
 			requestContext.operation.files = httpOperation.files
 		}
+	default:
+		return &httpGraphqlError{
+			message:    fmt.Sprintf("unsupported HTTP method %s", req.Method),
+			statusCode: http.StatusMethodNotAllowed,
+		}
 	}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d60e0d and 725fcec.

📒 Files selected for processing (15)

controlplane/src/core/bufservices/analytics/getOperationContent.ts (1 hunks)
otelcollector/otel-config.yaml (2 hunks)
router-tests/telemetry/telemetry_test.go (28 hunks)
router-tests/testenv/testenv.go (2 hunks)
router/core/graph_server.go (1 hunks)
router/core/graphql_prehandler.go (6 hunks)
router/core/router.go (1 hunks)
router/debug.config.yaml (1 hunks)
router/pkg/config/config.go (1 hunks)
router/pkg/config/config.schema.json (1 hunks)
router/pkg/config/testdata/config_defaults.json (1 hunks)
router/pkg/config/testdata/config_full.json (1 hunks)
router/pkg/otel/attributes.go (1 hunks)
router/pkg/trace/config.go (3 hunks)
studio/src/components/analytics/trace-details.tsx (5 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

router/pkg/config/testdata/config_full.json
router-tests/testenv/testenv.go
router/core/router.go
router/pkg/config/testdata/config_defaults.json
router/pkg/config/config.go
otelcollector/otel-config.yaml

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: endigma
PR: wundergraph/cosmo#2235
File: otelcollector/otel-config.yaml:22-27
Timestamp: 2025-09-29T11:31:58.714Z
Learning: The otelcollector/otel-config.yaml configuration is specifically for routing telemetry to the studio, which doesn't display operation body attributes. The attributes/delete_operation_bodies processor intentionally removes wg.operation.normalized_content and wg.operation.original_content to optimize storage since the studio doesn't use them. Users wanting to preserve operation body attributes should configure tracing to send directly to external systems like Tempo or Jaeger.

📚 Learning: 2025-10-01T20:39:16.113Z

Learnt from: SkArchon
PR: wundergraph/cosmo#2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

router-tests/telemetry/telemetry_test.go

🧬 Code graph analysis (4)

router/pkg/trace/config.go (1)

router/pkg/config/config.go (1)

CustomAttribute (43-47)

router/core/graphql_prehandler.go (1)

router/pkg/otel/attributes.go (2)

WgOperationOriginalContent (13-13)

WgOperationNormalizedContent (12-12)

router-tests/telemetry/telemetry_test.go (4)

router/pkg/otel/attributes.go (2)

WgOperationOriginalContent (13-13)

WgOperationNormalizedContent (12-12)

router-tests/testenv/testenv.go (3)

Run (107-124)

Config (286-343)

GraphQLRequest (1907-1915)

router/pkg/trace/tracetest/tracetest.go (1)

NewInMemoryExporter (11-17)

router/pkg/trace/config.go (1)

Config (51-74)

studio/src/components/analytics/trace-details.tsx (5)

studio/src/components/layout/graph-layout.tsx (1)

GraphContext (55-57)

connect/src/wg/cosmo/platform/v1/platform-PlatformService_connectquery.ts (1)

getOperationContent (918-927)

studio/src/lib/constants.ts (1)

docsBaseURL (1-1)

studio/src/components/code-viewer.tsx (1)

CodeViewer (82-176)

studio/src/lib/schema-helpers.ts (1)

extractVariablesFromGraphQL (162-205)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)

GitHub Check: build-router
GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
GitHub Check: integration_test (./telemetry)
GitHub Check: integration_test (./events)
GitHub Check: build_push_image
GitHub Check: build_test
GitHub Check: image_scan (nonroot)
GitHub Check: build_push_image (nonroot)
GitHub Check: image_scan
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (go)
GitHub Check: build_push_image
GitHub Check: build_push_image
GitHub Check: build_test
GitHub Check: build_test
GitHub Check: build_push_image
GitHub Check: build_test

studio/src/components/analytics/trace-details.tsx

router/pkg/config/config.schema.json

studio/src/components/analytics/trace-details.tsx

StarpTech

LGTM

how did this ever get through?

github-actions bot added studio router labels Sep 24, 2025

endigma force-pushed the jesse/eng-7530-different-queries-different-spans branch from 71b2aa1 to f62a624 Compare September 24, 2025 16:04

github-actions bot added the controlplane label Sep 29, 2025

github-advanced-security bot found potential problems Sep 29, 2025

View reviewed changes

studio/src/components/analytics/trace-details.tsx Dismissed Show dismissed Hide dismissed

studio/src/components/analytics/trace-details.tsx Dismissed Show dismissed Hide dismissed

endigma force-pushed the jesse/eng-7530-different-queries-different-spans branch from b0273e8 to ba3d95e Compare September 29, 2025 09:36

endigma marked this pull request as ready for review September 29, 2025 09:38

endigma requested review from JivusAyrus, Noroth, StarpTech, devsergiy, jensneuse, thisisnithin and wilsonrivera as code owners September 29, 2025 09:38

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

endigma requested a review from miklosbarabas as a code owner September 29, 2025 11:10

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

otelcollector/otel-config.yaml Show resolved Hide resolved

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

JivusAyrus approved these changes Oct 1, 2025

View reviewed changes

Noroth reviewed Oct 1, 2025

View reviewed changes

router/core/graphql_prehandler.go Show resolved Hide resolved

router/pkg/config/config.go Outdated Show resolved Hide resolved

router/debug.config.yaml Show resolved Hide resolved

wilsonrivera approved these changes Oct 1, 2025

View reviewed changes

endigma force-pushed the jesse/eng-7530-different-queries-different-spans branch from 4d60e0d to 725fcec Compare October 2, 2025 09:09

coderabbitai bot reviewed Oct 2, 2025

View reviewed changes

studio/src/components/analytics/trace-details.tsx Show resolved Hide resolved

Noroth approved these changes Oct 2, 2025

View reviewed changes

StarpTech requested changes Oct 2, 2025

View reviewed changes

router/pkg/config/config.schema.json Outdated Show resolved Hide resolved

studio/src/components/analytics/trace-details.tsx Show resolved Hide resolved

StarpTech approved these changes Oct 3, 2025

View reviewed changes

endigma added 2 commits October 3, 2025 10:32

feat: retrieve operation body from gqlmetrics instead of traces

d40b26f

tests: update telemetry tests to reflect new behaviour

46f8889

endigma added 7 commits October 3, 2025 10:32

chore: linter suggestion

c2dd2d5

feat: add config flag to control operation body attribute in traces

d38d11e

chore: cleanup+graceful degredation

d0ccb88

chore: exclude unused attributes from clickhouse in otel config

5e7a641

tests: update fixtures

77c91eb

test: fix broken test from another PR

021079f

how did this ever get through?

chore: rename option to OperationContentAttributes

5e952f8

endigma force-pushed the jesse/eng-7530-different-queries-different-spans branch from a0129ca to 5e952f8 Compare October 3, 2025 09:32

chore: update snapshots

ff3b808

endigma merged commit 52cecf4 into main Oct 3, 2025
34 checks passed

endigma deleted the jesse/eng-7530-different-queries-different-spans branch October 3, 2025 10:07

endigma mentioned this pull request Oct 3, 2025

docs: update tracing configuration with new operation content attributes option wundergraph/cosmo-docs#176

Merged

This was referenced Nov 13, 2025

feat: operations overview #2331

Merged

fix: cache vars and remapping normalization #2318

Merged

feat: retrieve operation body from gqlmetrics instead of traces #2235

feat: retrieve operation body from gqlmetrics instead of traces #2235

Uh oh!

Conversation

endigma commented Sep 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Checklist

Uh oh!

coderabbitai bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Router image scan passed

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StarpTech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

endigma commented Sep 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 24, 2025 •

edited

Loading

github-actions bot commented Sep 24, 2025 •

edited

Loading