Skip to content

Streaming/SSE routes are reported as critically slow endpoint latency #156

@virti0

Description

@virti0

Summary

Long-lived streaming HTTP routes such as Server-Sent Events can be reported as critically slow endpoints because Traceway records the full request lifetime as endpoint duration. For SSE this duration is not request processing latency; it is the expected connection lifetime.

Example pattern from a Go/Gin application using go.tracewayapp.com/tracewaygin:

  • Route: GET /stream/events
  • Transport: text/event-stream
  • Behavior: sends an initial connection comment/event, periodic heartbeats, and closes/reconnects before a max stream age
  • Observed Traceway result: endpoint appears critically slow, with P99 measured in minutes
  • Expected interpretation: healthy streaming connection lifetime, not a slow request

Why this matters

Streaming routes can dominate endpoint latency rankings, Apdex/impact scoring, and slow-request alerts even when they are behaving correctly. Operators still want to observe these routes, but they need different semantics from normal request/response endpoints.

Ignoring the route entirely is not ideal because the route can still fail, leak subscribers, stop heartbeating, or return 5xx. What is needed is a way to keep visibility while excluding or separately classifying stream lifetime from normal endpoint latency/SLO calculations.

Current API/docs found

The Gin middleware docs say Traceway captures duration for every request and mention route filtering/passthrough configuration:

https://docs.tracewayapp.com/client/gin-middleware

The installed tracewaygin package has exact route-template ignoring via WithIgnoredPaths(paths ...string), but that removes the route from recording rather than classifying it as a stream.

I could not find docs or API for:

  • marking a route as streaming/SSE/WebSocket-like
  • excluding selected routes from endpoint latency/APDEX/slow ranking while still recording status/errors
  • route-specific latency threshold overrides from the Go Gin SDK
  • dashboard-side endpoint filters for long-lived streams

The project README also mentions “per-endpoint slow-threshold override”, but I could not find how to configure it for SDK-captured endpoints or use it specifically for streaming routes.

Suggested solution

One or more of these would solve the problem:

1. SDK route classification

router.Use(tracewaygin.New(
    connectionString,
    tracewaygin.WithStreamingRoutes("GET /stream/events"),
))

Streaming routes would still record status code, errors, body size, and connection lifetime, but would not count the same way in normal endpoint latency/P99/Apdex alerts.

2. SDK route predicate

tracewaygin.WithRouteClassifier(func(c *gin.Context) tracewaygin.RouteClass {
    if c.FullPath() == "/stream/events" {
        return tracewaygin.RouteClassStreaming
    }
    return tracewaygin.RouteClassRequestResponse
})

3. Dashboard-side endpoint settings

Allow an endpoint to be marked as Streaming/SSE and excluded from critical slow latency calculations while keeping errors visible.

4. Documented workaround

If the intended answer is WithIgnoredPaths, document the tradeoff clearly: it prevents false latency alerts but also removes visibility for that route.

Acceptance criteria

  • Users can keep SSE/streaming routes observable without them polluting normal endpoint latency rankings.
  • P99/slow endpoint alerts can distinguish “connection lifetime” from “request processing latency”.
  • Gin middleware docs mention how to handle SSE/streaming endpoints.

Metadata

Metadata

Labels

doneImplemented

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions