-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Be able to correlate timeouts in reverse-proxy layer in front of Synapse (tag specific request headers) #13786
Be able to correlate timeouts in reverse-proxy layer in front of Synapse (tag specific request headers) #13786
Conversation
…pse (tag specific request headers) Fix #13685
# Tag any headers that we need to extract from the request. This | ||
# is useful to specify any headers that a reverse-proxy in front | ||
# of Synapse may be sending to correlate and match up something | ||
# in that layer to a Synapse trace. ex. when Cloudflare times | ||
# out it gives a `cf-ray` header which we can also tag here to | ||
# find the trace. | ||
for header_name in self.hs.config.tracing.request_headers_to_tag: | ||
headers = request.requestHeaders.getRawHeaders(header_name) | ||
if headers is not None: | ||
parent_span.set_tag( | ||
SynapseTags.REQUEST_HEADER_PREFIX + header_name, | ||
str(headers[0]), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested that this works locally but I'm not sure if the cf-ray
header that we want to use on matrix.org
is actually a thing.
Based on the following documentation and that we see it in our haproxy logs already, I assume this will be viable,
Add the
CF-Ray
header to your origin web server logs to match requests proxied to Cloudflare to requests in your server logs.
if headers is not None: | ||
parent_span.set_tag( | ||
SynapseTags.REQUEST_HEADER_PREFIX + header_name, | ||
str(headers[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the call to str
? If header_name
is a string, then headers[0]
will be a string.
See:
headers = request.requestHeaders.getRawHeaders(header_name) | ||
if headers is not None: | ||
parent_span.set_tag( | ||
SynapseTags.REQUEST_HEADER_PREFIX + header_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with opentracing tags, is request_header.cf-ray
an OK thing to do? Or do we need to normalize that to something like request_header.cf_ray
or whatever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag can be whatever. And since it's the name of a header, I think it's probably easiest to follow with the actual header name. Similar to what we do with function arguments but those happen to conform to the Python snake casing already.
These are the OpenTelemetry docs but here are some naming conventions, https://opentelemetry.io/docs/reference/specification/common/attribute-naming/
* `request_headers_to_tag`: A list of headers to extract from the request and | ||
add to to the top-level servlet tracing span as tags. Useful when you're using | ||
a reverse proxy service like Cloudflare to protect your Synapse instance in | ||
order to correlate and match up requests that timed out at the Cloudflare | ||
layer to the Synapse traces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to throw a wrench in these plans, but if other reverse proxies do something similar; I wonder if it would make more sense to have a header which could be the source for SynapseRequest.request_id
(instead of generating a new one):
Lines 174 to 175 in 9772e36
def get_request_id(self) -> str: | |
return "%s-%i" % (self.get_method(), self.request_seq) |
Would that be clearer instead of having two separate IDs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Seems reasonable to me and a little cleaner. I'll create a new PR with this solution ⏩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this would make it harder to track things in the logs or anything, but seems like it would be a lot cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this out in #13801
Losing the simplicity of the incrementing integer to reference in logs as you scroll kinda sucks though. I'm leaning back to this option I think 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Losing the simplicity of the incrementing integer to reference in logs as you scroll kinda sucks though. I'm leaning back to this option I think 🤔
Is that really used though? I usually just copy and paste for grep. Once the number gets beyond a few digits it is useless anyway IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the one having to log dive and debug big Synapse instances so I don't know. In tests, we will still have the same sequential request ID's since this config won't be defined. We're not changing any default either so it's basically opt-in.
If anyone has a strong desire for the old way, we can always revert as well. If you're happy with #13801, we can move forward with that ⏩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we should ask folks real quick then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for asking in #synapse-dev, https://matrix.to/#/!vcyiEtMVHIhWXcJAfl:sw1v.org/$kMyUMCOU4fZpH6AOUPOIqGyZADn38ebt2kn2wo9cAXU?via=matrix.org&via=element.io&via=beeper.com ❤️
Discussion being tracked in #13801 (comment)
Closing as it seems we're settling on the alternative #13801 |
Add
opentracing.request_headers_to_tag
config to specify which request headers extract and tag traces with in order to correlate timeouts in reverse-proxy layers in front of Synapse with traces.Fix #13685
Alternative solution: #13801
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Pull request includes a sign off(run the linters)