CompressionLayer does not produce compressed data in streaming #292

Geal · 2022-08-26T08:56:29Z

Bug Report

Version

0.3.4

Platform

Linux 5.18.10-76051810-generic, but probably not relevant

Crates

tower-http, async-compression

Description

We noticed in apollographql/router#1572 that the CompressionLayer waits until the entire response body is compressed to send it to the client. This is due to the Encoder behaviour in async-compression: Nullus157/async-compression#154.

How we saw that result:

the router produces a multipart response, where we want the client to receive parts as soon as they are produced. The parts are produced as a Stream of Bytes that is wrapped in an axum StreamBody
In one test the first part was produced immediately, and the next one after a few seconds
If the response is not compressed, we see them come as soon as possible
If the response is compressed (changing the compression algorithm does not change anything), then we receive nothing, and after a few seconds we receive both parts at the same time

The issue comes from this part:

https://github.com/Nemo157/async-compression/blob/ada65c660bcea83dc6a0c3d6149e5fbcd039f739/src/tokio/bufread/generic/encoder.rs#L63-L74

When the underlying stream returns Poll::Pending, ready! will return it directly, so no data will be produced. I patched this in our router to produce data whenever we see a Pending, but that's not a proper solution.

Instead, the CompressionLayer should be able to direct the Encoder to produce data depending on conditions like how much data could be produced, or how long since the last chunk was sent

The text was updated successfully, but these errors were encountered:

davidpdrsn · 2022-08-26T10:08:40Z

Thanks for the report! Compressing streaming response is hard and I don't know a lot about how to do it or whether it requires changes in async-compression.

I don't have time to work on this but PRs is much appreciated!

Geal · 2022-10-12T15:08:31Z

@davidpdrsn in Nullus157/async-compression#154 we're going towards adding a poll_flush method to the encoders, but I have difficulties wrapping my head around WrapBody. It seems very generic, used for both compression and decompression, and since it uses ReaderStream I do not see a point where calling poll_flush would fit.

Would it be acceptable to have something else than WrapBody for the compression side?

I'm also wondering if this issue would also appear on the decompression side, considering Nullus157/async-compression#123

davidpdrsn · 2022-12-02T12:03:41Z

Would it be acceptable to have something else than WrapBody for the compression side?

Yeah the whole WrapBody thing is pretty complex. It might be pulling its weight. If you wanna take a stab at refactoring things to make poll_flush possible to access then please do!

kevincox · 2024-09-05T11:54:28Z

In my case I only have a few endpoints where streaming is important. So I have worked around this by adding a middleware outside of the compression layer that strips the accept-encoding header.

tower::ServiceBuilder::new()
	.map_request(|mut req: axum::extract::Request| {
		if req.uri().path() == "/my-streaming-endpoint" {
			req.headers_mut().remove("accept-encoding");
		}
		req
	})
	.layer(tower_http::compression::CompressionLayer::new())

Unfortunately you can't do this with compress_when as it only has access to the response. So you can't set by path.

One option would be a marker extension to the response object. Maybe that could be included in the DefaultPredicate as an official workaround until the streaming compression is sorted.

How to do streaming compression is definitely quite hard. However I think the goal is probably pretty clear. More or less we want to flush the compression whenever the send buffer is empty. Basically it is better to send lightly-compressed data than nothing. It isn't quite that simple due to things like proxies in the middle that may be buffering for the client, so our send buffer may empty quickly even if there is a buffer somewhere between us and the client. But I suspect some basic metric like "if there is no new data for 10ms, flush" is probably good enough for 99% of cases. It will allow good compression for responses where the sever is streaming data quickly but for intermittent responses it will flush each chunk fairly quickly.

The ideal situation would also be some sort of explicit flush, so that the generator can indicate when something should urgently be seen by the end-user. But I think auto-flush is still likely a good default.

adriangb · 2024-09-22T13:53:20Z

One option would be to let users choose when to flush the compression buffer. Two heuristics that come to mind are number of bytes and number of marker characters (e.g. "flush at a newline every 1MB").

kevincox · 2024-09-22T14:18:10Z

That wouldn't really be sufficient for my use cases. In my cast I am doing a slow operation and occasionally outputting a status update. That can be a very small number of bytes and I don't really want to inject a marker just to make flushing work.

adriangb · 2024-09-22T14:21:11Z

Maybe there's a more advanced version but I think having simple defaults for common cases makes sense, and they can be an implementation of the trait or hook.

davidpdrsn added C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. labels Aug 26, 2022

davidpdrsn added the E-hard Call for participation: Experience needed to fix: Hard / a lot label Dec 2, 2022

lazear mentioned this issue Mar 19, 2024

Performance traps with arrow/parquet? apache/arrow-rs#5490

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CompressionLayer does not produce compressed data in streaming #292

CompressionLayer does not produce compressed data in streaming #292

Geal commented Aug 26, 2022

davidpdrsn commented Aug 26, 2022

Geal commented Oct 12, 2022

davidpdrsn commented Dec 2, 2022

kevincox commented Sep 5, 2024

adriangb commented Sep 22, 2024

kevincox commented Sep 22, 2024

adriangb commented Sep 22, 2024

CompressionLayer does not produce compressed data in streaming #292

CompressionLayer does not produce compressed data in streaming #292

Comments

Geal commented Aug 26, 2022

Bug Report

Version

Platform

Crates

Description

davidpdrsn commented Aug 26, 2022

Geal commented Oct 12, 2022

davidpdrsn commented Dec 2, 2022

kevincox commented Sep 5, 2024

adriangb commented Sep 22, 2024

kevincox commented Sep 22, 2024

adriangb commented Sep 22, 2024