-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query-frontend: Extract org id from from request headers #3245
Conversation
fe91614
to
f3b47a7
Compare
Thanks for the work. The CI failure is not related. |
@neilfordyce can you explain when is this needed/useful? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits. Overall it looks very good! Would you mind also documenting this?
cmd/thanos/query_frontend.go
Outdated
func extractOrgId( | ||
conf *queryFrontendConfig, | ||
r *http.Request, | ||
) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit, we can inline this.
cmd/thanos/query_frontend.go
Outdated
@@ -82,6 +83,10 @@ func registerQueryFrontend(app *extkingpin.App) { | |||
cmd.Flag("query-frontend.log-queries-longer-than", "Log queries that are slower than the specified duration. "+ | |||
"Set to 0 to disable. Set to < 0 to enable on all queries.").Default("0").DurationVar(&cfg.CortexFrontendConfig.LogQueriesLongerThan) | |||
|
|||
cmd.Flag("query-frontend.org-id-header", "Request header names used to set the org id field in the slow query log (repeated flag). "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking about having a default org-id-header X-Scope-OrgId
and users can also specify other headers as well. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds like a sensible default 👍. Will that help with compatibility with Cortex things perhaps?
@krasi-georgiev - absolutely, apologies for the missing context. We find it helpful to identify users and services who are creating (slow) queries. In that way, we can find the source of slow queries and alter the queries, e.g. if they are continually causing memory issues for queriers. It's a bit of a stopgap to help reduce problems large queries cause with high memory use; but more generally, it's useful to be able to see who is running what queries. |
Why not use some of the standard HTTP headers to identify the source that creates these slow queries? |
We definitely could do that, but the best header to use would likely vary depending on the client. E.g. most people interacting with our Thanos setup are logged in Grafana users, so it's handy to identify them via X-Grafana-User. Other users of Thanos might not be so interested in the specific user (or they might be anonymous in Grafana) so they could fallback to another standard header. |
I thought origin might be enough, but it appears like Grafana doesn't forward the origin when it's configured to proxy requests through the data source proxy. This is the complete set of headers I can see from a request coming from the Prometheus data source. x-forwarded-for might be helpful, but an IP address isn't as helpful as a user name. If I change the datasource to go direct from the Browser there is an origin header, so we'd be able to see slow requests were coming from Grafana. But again probably not as helpful as seeing the user name in the logs. |
hm, yeah I see what you mean. But isn't an IP a more unique ID here? It is not unlikely that many grafana installations will use the admin username which will cause the same log label from different sources. |
I see host is present in both modes here |
btw could this be an issue with Grafana not adding the origin header? |
We only allow logged in users in our Grafana setup, so the x-grafana-user header is always set and gives us exactly what we want. IP address could be helpful in some setups, but for others it might only reveal the address of a NAT which would be the same for all users. My thought was, different headers will make sense in different contexts. This might be why Cortex have the user id header fixed to X-Scope-OrgId, then leaves it up to a reverse proxy to allow users to fill out that header whatever logic they want from the request. |
Got it - using IP or Origin header will always be the same when the request comes from the same grafana instance. Using the grafana username will indeed give you a better understanding of which use is sending these heavy queries. |
@@ -82,6 +83,10 @@ func registerQueryFrontend(app *extkingpin.App) { | |||
cmd.Flag("query-frontend.log-queries-longer-than", "Log queries that are slower than the specified duration. "+ | |||
"Set to 0 to disable. Set to < 0 to enable on all queries.").Default("0").DurationVar(&cfg.CortexFrontendConfig.LogQueriesLongerThan) | |||
|
|||
cmd.Flag("query-frontend.org-id-header", "Request header names used to set the org id field in the slow query log (repeated flag). "+ | |||
"If multiple headers match the request, the first matching arg specified will take precedence. "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this bit? Can you add test for this case to make it a bit more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yeah this sounded clearer in my head; I'll try to make the behaviour clearer in the help string. The idea is that if --query-frontend.org-id-header
is used multiple times, we try to extract a header from the request in order the command line args were specified.
There are test cases to check the precedence behaviour in query_frontend_test.go, e.g. we check that when both --query-frontend.org-id-header=x-server-id
and query-frontend.org-id-header=x-grafana-user
are specified that org id is set to the value of the x-server-id header. However, happy to split that into a few test functions with different names if it would be clearer what's being tested?
Also, I noticed that this file has changed a bit since writing it and after rebasing it seems bit out of place. Considering moving the new logic into pkg/queryfrontend. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, now looking at the test it is clear.
I think the test location is fine, but if you prefer can move it.
@@ -161,6 +161,12 @@ Flags: | |||
Log queries that are slower than the specified | |||
duration. Set to 0 to disable. Set to < 0 to | |||
enable on all queries. | |||
--query-frontend.org-id-header=<http-header-name> ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yeya24 - thanks for the feedback. Regarding docs, I'm considering adding something to explain the new flag in the slow-query-log
section of query-frontend.md. And probably some godoc to explain the new function. Is that the kind of thing you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply add this description to the flag and I think it would be clear 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. I think it's already like that. This description is the one generated from the flag in cmd/thanos/query_frontend.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean you can add something like This ord id will be added to the slow-query-log
to the flag description directly and then regenerate the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made an update to the flag description. Hope it's clearer, but open to suggestions for a better way to describe it.
LGTM |
Sorry for the late response. I think this feature is nice to have. @neilfordyce Can you please fix the conflicts? Then we can merge 🥇 |
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
8910028
to
a3dd82f
Compare
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
a3dd82f
to
35dae03
Compare
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
9cbc917
to
1b505ae
Compare
Thanks for the bump. This slipped my mind. Sure thing, I've rebased and fixed the conflicts. The test failure seems to be for unrelated reasons. |
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇 LGTM. Thanks for your contribution.
) * query-frontend: Extract org id from from request headers Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net> * docs: regenerate docs Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net> * query-frontend: gofmt Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net> * query-frontend: inline function header Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net> * docs: rebuild query-frontend docs Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net> * docs: clarify usage of org-id-header flag Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net> Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>
Signed-off-by: Neil Fordyce neilfordyce@skyscanner.net
Changes
query-frontend.org-id-header
flag to specify HTTP header(s) to populate slow query log org_id field: Fixes query-frontend: populate org_id in slow query logs #3177Verification
Executed query-frontend pointed at a local querier with this configuration
thanos query-frontend --http-address=127.0.0.1:9093 --query-frontend.downstream-url=http://localhost:9090 --query-frontend.log-queries-longer-than="-1s" --query-frontend.org-id-header="x-grafana-user" --query-frontend.org-id-header="x-scope-orgid"
Ran query with expected headers
grafana-username shows in org_id field of query frontend logs
Further testing summary