-
Notifications
You must be signed in to change notification settings - Fork 201
fix(router): flush errors on sse streams #2377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(router): flush errors on sse streams #2377
Conversation
WalkthroughMoved an early HTTP response body close in an SSE subscription test, added an SSE subtest asserting error propagation when a subgraph returns 403, added an HTTP flush branch in GraphQLHandler.WriteError, and ensured subscription error writes flush and mark completion in executeSubscription. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Router image scan passed✅ No security vulnerabilities found in image: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2377 +/- ##
===========================================
+ Coverage 42.75% 61.55% +18.79%
===========================================
Files 1030 229 -801
Lines 142648 23816 -118832
Branches 8660 0 -8660
===========================================
- Hits 60984 14659 -46325
+ Misses 79997 7919 -72078
+ Partials 1667 1238 -429 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Does Codecov recognize that tests have been added to the router-tests package? Seems like not. |
…k-returns-errors-on-sse-streams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
router-tests/http_subscriptions_test.go (1)
202-268: Consider verifying stream completion after the error.The successful SSE test explicitly checks for an "event: complete" message (line 186), but this error test ends after asserting the two blank lines without verifying whether a completion event is sent or whether the stream closes cleanly with no further messages. Consider adding an assertion to confirm the expected stream termination behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
router-tests/http_subscriptions_test.go(2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.
Applied to files:
router-tests/http_subscriptions_test.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.
Applied to files:
router-tests/http_subscriptions_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router-tests/http_subscriptions_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.
Applied to files:
router-tests/http_subscriptions_test.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Applied to files:
router-tests/http_subscriptions_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image
- GitHub Check: image_scan
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
router-tests/http_subscriptions_test.go (3)
139-139: Good resource management practice.Moving the defer immediately after acquiring the response ensures proper cleanup even if subsequent assertions fail.
202-268: Test effectively validates error propagation over SSE streams.The test structure mirrors the successful SSE test appropriately and correctly simulates a subgraph error scenario. The middleware configuration and SSE parsing logic are sound.
260-266: SSE error responses end with two blank lines instead of one, creating an asymmetry with successful messages.The test at lines 260-266 expects two consecutive blank lines after the error message. In contrast, successful SSE messages (lines 171, 181, 197), heartbeat events, and complete events all terminate with a single blank line. The error path writes
event: next\ndata:followed byrequestErrors.WriteResponse(w), thenFlush(), while successful responses go throughHttpFlushWriterwith explicit\n\nseparation. Verify whether this two-blank-line termination for errors is intentional per the GraphQL-core SSE specification or if the error response path should align with the single-blank-line pattern.
alepane21
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Note: Did some local tests to see if it works, but only with one simultaneous client. Gonna be sure and test with the Apollo client with some simultaneous connections. |
…k-returns-errors-on-sse-streams
|
I did some tests with a custom script using apollo client with both graphql-ws and graphql-sse as connection clients. The error seems to be propagated correctly to clients on sse, now matching what ws returns. Here are log outputs: ws-client sse-client The only difference is that sse-client keeps reconnecting after the error. This seems to stem from the sse-client itself and has been the case before this change. It was just silent, no log output in that case, because the router never returned data on the connection prior to closing it. I suspect the sse-client retries because we never sent a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
router/core/websocket.go (1)
1050-1067: Consider also cleaning upsubscriptionsstate when subscription setup failsIn the
SubscriptionResponsePlanerror path you now emit the error, flush, and send acomplete, which is good for client semantics. However, this path doesn’t remove the entry fromh.subscriptions, so operations that fail duringAsyncResolveGraphQLSubscriptionwill leave behind a mapping frommsg.IDtosubscriptionID.By contrast,
WebSocketConnectionHandler.Completedeletes the subscription ID before sendingcomplete. To avoid accumulating stale entries on long‑lived connections, consider also deleting the subscription here (e.g.,h.subscriptions.Delete(registration.msg.ID)), or refactoring through a helper that both sends error/complete and cleans up the state while preserving the desired flush/complete ordering.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
router/core/websocket.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Applied to files:
router/core/websocket.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_test
- GitHub Check: image_scan
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./events)
- GitHub Check: build_push_image
- GitHub Check: Analyze (go)
- GitHub Check: build_test
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
…k-returns-errors-on-sse-streams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
router/core/graphql_handler.go (1)
435-441: Consider logging flush errors instead of discarding them.Both the websocket and HTTP flush error results are currently discarded. While this is consistent with the existing pattern, logging these errors could help diagnose issues where clients don't receive error responses due to flush failures (e.g., broken connections, write timeouts).
Apply this diff to log flush errors for both paths:
if wsRw, ok := w.(*websocketResponseWriter); ok { - _ = wsRw.Flush() + if err := wsRw.Flush(); err != nil { + requestLogger.Debug("Failed to flush websocket error response", zap.Error(err)) + } } if httpRw, ok := w.(*HttpFlushWriter); ok { - _ = httpRw.Flush() + if err := httpRw.Flush(); err != nil { + requestLogger.Debug("Failed to flush HTTP error response", zap.Error(err)) + } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
router/core/graphql_handler.go(1 hunks)router/core/websocket.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/core/websocket.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Applied to files:
router/core/graphql_handler.go
🧬 Code graph analysis (1)
router/core/graphql_handler.go (1)
router/core/flushwriter.go (1)
HttpFlushWriter(34-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build_test
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_test
- GitHub Check: build_push_image
- GitHub Check: image_scan
- GitHub Check: image_scan (nonroot)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/core/graphql_handler.go (1)
438-441: LGTM! Flush implementation correctly mirrors websocket pattern.The added flush for
HttpFlushWriterensures that error responses are immediately sent to SSE and multipart subscription clients, fixing the issue where errors were buffered and not delivered. The implementation is consistent with the existing websocket flush pattern.
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.
Checklist
When we write errors to a connection backed by
HttpFlushWriter(used on subscriptions via SSE/Multipart) we do not flush the connection as oposed to a websocket connection, where this actually happens. I fixed it by simply checking if we are dealing with a flush writer and flush if so. Added a test to back this up.