-
Notifications
You must be signed in to change notification settings - Fork 820
QueryFrontend: Return server side performance information on server-timing header #3685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a7015b0
to
f625d77
Compare
server-timing
header
server-timing
headere639cbc
to
8148525
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alanprot for working on this. I see a value in sending downstream this information.
However, I believe the ServiceStats
is a bit over engineered for this. The Cortex philosophy is to start small and let the design evolve when new features (or changes) are introduced. In this PR you're basically introducing a framework to send generic data down to the client, while we only need to send 1 header. I would suggest to start small, remove ServiceStats
and just inject the querier_wall_time
with httpsnoop
in the middleware. We can reconsider it and evolve the design if and when we'll need to send other information downstream.
8f514a7
to
61d81d6
Compare
e8ca8d5
to
55e0a7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good work! The new design is more lightweight and LGTM. I left a couple of questions / comments, but no hard blocker. Thanks!
integration/query_frontend_test.go
Outdated
@@ -302,6 +302,13 @@ func runQueryFrontendTest(t *testing.T, cfg queryFrontendTestConfig) { | |||
assert.Equal(t, model.Time(1595846750806), matrix[0].Values[2].Timestamp) | |||
} | |||
|
|||
// No need to repeat the test on missing metric name for each user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// No need to repeat the test on missing metric name for each user. | |
// No need to repeat the test on Server-Timing header for each user. |
pkg/frontend/transport/handler.go
Outdated
@@ -88,7 +88,7 @@ func (f *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
if f.cfg.QueryStatsEnabled { | |||
var ctx context.Context | |||
stats, ctx = querier_stats.ContextWithEmptyStats(r.Context()) | |||
r = r.WithContext(ctx) | |||
*r = *r.WithContext(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pstibrany Do you think it's safe? I do understand why we need to do it (given the current implementation) but I've mixed feelings about manipulating the original request. Not sure it's something we've already done in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very brittle... it may easily break if we insert another middleware between this handler, and ServerTimingMiddleware.
We could move generation of server timing header into this handler.
Alternatively we could move stats logging into the same middleware as server-timing header.
func ServerTimingMiddleware(next http.Handler, statsEnabled bool) http.Handler { | ||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
|
||
if !statsEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing it here, why don't we skip wrapping with ServerTimingMiddleware()
at all when query stats are not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
}) | ||
} | ||
|
||
func writeTimingHeader(startTime time.Time, headerWritten *bool, headers http.Header, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a best practice, I would avoid manipulating an input parameter (headerWritten
in this case). You can move the if headerWritten
logic out side this function, before calling writeTimingHeader()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, make it a method of some type which will hold headerWritten
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
var ( | ||
headerWritten = false | ||
startTime = time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're already tracking the response time in transport.Handler
. I'm wondering we should just store it in the query stats struct, so that the two values (the one logged and the one reported via HTTP header) perfectly match.
parts := make([]string, 0) | ||
stats := querier_stats.FromContext(r.Context()) | ||
if stats != nil { | ||
parts = append(parts, statsValue("querier_wall_time", stats.WallTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the expected unit for the duration in the Server-Timing
specs? I'm asking cause we return nanoseconds and I'm wondering how it gets displayed by the browser dev tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is formatted to millseconds in statsValue
, which seems to be what spec is using. (Although spec is not very explicit about it, and just mentions 'ms' in examples. Weird.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should never directly read stats.WallTime
but use stats.LoadWallTime()
.
}, | ||
Write: func(next httpsnoop.WriteFunc) httpsnoop.WriteFunc { | ||
return func(p []byte) (int, error) { | ||
n, err := next(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? If next
is called first, it may start writing response, and then it will be too late to write timing header afterwards. (Unless we use Trailer
, but I don't see that in this code)
w = httpsnoop.Wrap(w, hooks) | ||
next.ServeHTTP(w, r) | ||
|
||
writeTimingHeader(startTime, &headerWritten, w.Header(), r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing header after serving request seems to be too late. Under what conditions would one of WriteHeader
or Write
hooks not write the header already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would happen if Write
or WriteHeader
is not called on any handler in the chain.
parts := make([]string, 0) | ||
stats := querier_stats.FromContext(r.Context()) | ||
if stats != nil { | ||
parts = append(parts, statsValue("querier_wall_time", stats.WallTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is formatted to millseconds in statsValue
, which seems to be what spec is using. (Although spec is not very explicit about it, and just mentions 'ms' in examples. Weird.).
} | ||
|
||
func statsValue(name string, d time.Duration) string { | ||
return name + ";dur=" + strconv.FormatInt(d.Milliseconds(), 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Spec allows value to be float, not just integer, if you want to use higher precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
}) | ||
} | ||
|
||
func writeTimingHeader(startTime time.Time, headerWritten *bool, headers http.Header, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, make it a method of some type which will hold headerWritten
value.
pkg/frontend/transport/handler.go
Outdated
@@ -88,7 +88,7 @@ func (f *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
if f.cfg.QueryStatsEnabled { | |||
var ctx context.Context | |||
stats, ctx = querier_stats.ContextWithEmptyStats(r.Context()) | |||
r = r.WithContext(ctx) | |||
*r = *r.WithContext(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems very brittle... it may easily break if we insert another middleware between this handler, and ServerTimingMiddleware.
We could move generation of server timing header into this handler.
Alternatively we could move stats logging into the same middleware as server-timing header.
55e0a7e
to
0849a44
Compare
b75a6de
to
beced59
Compare
…er-timing header) Signed-off-by: alanprot <approtas@amazon.com>
beced59
to
257df6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for addressing our feedback! The changeset now looks way more clean!
What this PR does:
server-timing
header with querier wall time information on query requests when-frontend.query-stats-enabled=true
server-timing
header withrequest_duration
on any request on query-frontendWe may want to create a config to enable this feature.
Result on chrome:
Which issue(s) this PR fixes:
Enhancement of #3595
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]