-
Notifications
You must be signed in to change notification settings - Fork 363
Description
Synapse takes forever to send large responses. It takes us longer to send just the response than it does for us to process and encode_json_response
in some cases.
Examples
98s to process the request and encode_json_response
but the request isn't finished sending until 484s (8 minutes) which is 6.5 minutes of dead-time. The response size is 36MB
Jaeger trace: 4238bdbadd9f3077.json

59s to process and finishes after 199s. The response size is 36MB
Jaeger trace: 2149cc5e59306446.json

I've come across this before and it's not a new thing. For example in #13620 (original issue), I described as the "mystery gap at the end after we encode the JSON response (encode_json_response
)" but never encountered it being this egregious.
It can also happen for small requests. 2s to process and finishes after 5s. The response size is 120KB

Investigation
@kegsay pointed out _write_bytes_to_request
which runs after encode_json_response
and has comments like "Write until there's backpressure telling us to stop." that definitely hint at some areas of interest.
synapse/synapse/http/server.py
Lines 869 to 873 in 03937a1
with start_active_span("encode_json_response"): | |
span = active_span() | |
json_str = await defer_to_thread(request.reactor, encode, span) | |
_write_bytes_to_request(request, json_str) |
The JSON serialization is done in a background thread because it can block the reactor for many seconds. This part seems normal and fast (no problem).
But we also use _ByteProducer
to send the bytes down to the client. Using a producer ensures we can send down all of the bytes to the client without hitting a 60s timeout (see context in comments below)
synapse/synapse/http/server.py
Lines 883 to 889 in d40bc27
# The problem with dumping all of the response into the `Request` object at | |
# once (via `Request.write`) is that doing so starts the timeout for the | |
# next request to be received: so if it takes longer than 60s to stream back | |
# the response to the client, the client never gets it. | |
# | |
# The correct solution is to use a Producer; then the timeout is only | |
# started once all of the content is sent over the TCP connection. |
This logic was added in:
- Iteratively encode JSON responses matrix-org/synapse#8013
- Switch the JSON byte producer from a pull to a push producer matrix-org/synapse#8116
Some extra time is expected as we're working with the reactor instead of blocking it but it seems like something isn't tuned optimally (chunk size, starting/stopping too much, etc)