fix: govern copy memory + fix passthrough-copy ClientResponse.read crash by ServerSideHannes · Pull Request #97 · ServerSideHannes/s3proxy-python

ServerSideHannes · 2026-06-30T16:19:33Z

What

Two fixes in the server-side copy path, both root causes of the
concurrent-backup failures.

1. Govern copy memory (the OOM fix)

A CopyObject / UploadPartCopy carries no request body, so the request-level
memory limiter reserved ~nothing for it — yet each copy decrypts the source
and re-encrypts it in RAM. Under a Scylla dedup flood, dozens of these ran at
once with nothing throttling them → pod blew past its memory cap → kernel
OOM-kill (exit 137) → backup failures.

Both copy paths (_copy_encrypted and _streaming_copy_part) are now wrapped in
concurrency.reserve_memory(copy_pipeline_peak(size)), so copies are throttled
by the same budget as uploads. When the budget is full they wait instead of
piling up. New crypto.copy_pipeline_peak() reports the real peak:

streamed copies (> 32MB): 4 × MAX_BUFFER_SIZE — pipeline-bounded, independent of object size
small buffered copies: ~3 × the object size (floored at one buffer)

2. Fix the passthrough-copy crash

_iter_copy_source called body.read(8MB), but body is an aiohttp
ClientResponse whose .read() takes no size argument → every such copy
500'd with TypeError: ClientResponse.read() takes 1 positional argument but 2 were given. Changed to body.content.read(8MB) (stream via the StreamReader).
Seen in prod:

File ".../copy.py", line 325, in _iter_copy_source
    chunk = await body.read(crypto.MAX_BUFFER_SIZE)
TypeError: ClientResponse.read() takes 1 positional argument but 2 were given

Proof (local repro, 256MiB cap, 48MiB governor budget)

Same 64-concurrent CopyObject flood over 8×64MB encrypted objects:

	WITHOUT fix	WITH fix
container	`Running=false`	`Running=true`
kernel OOM flag	`OOMKilled=true`	`OOMKilled=false`
exit code	137 (OOM kill)	0
copies succeeded	0 / 64 (conn refused/closed)	64 / 64
peak RSS	hit 256MiB cap → killed	~195MiB, reclaims

Tests

New tests/unit/test_copy_memory_governing.py: pins copy_pipeline_peak sizing
and proves reserve_memory bounds concurrent copies to the budget (≤2 at once
for a 64MB budget / 32MB peak; active never overruns; all released).
tests/conftest.py mock MockS3Response now exposes .content (mirrors real
aiohttp), so the streaming-copy tests exercise the fixed content.read(n) path.
Full unit suite green (452 passed); ruff check + ruff format --check clean.

Server-side copies (CopyObject / UploadPartCopy) carry no request body, so the request-level memory limiter reserved ~nothing for them -- yet each copy decrypts the source and re-encrypts it in RAM. Under a Scylla dedup flood, dozens ran concurrently with nothing throttling them and the pod was OOMKilled (exit 137) -> backup failures. Gate both copy paths through concurrency.reserve_memory(copy_pipeline_peak) so copies are bounded by the same budget as uploads. copy_pipeline_peak returns the real peak: streamed copies ~4x MAX_BUFFER_SIZE (size-independent), small buffered copies ~3x the object. Also fix _iter_copy_source: it called body.read(8MB) but body is an aiohttp ClientResponse whose read() takes no size arg, so every passthrough copy 500'd with TypeError. Stream via body.content.read(n). Verified locally: 64-concurrent copy flood at a 256MiB cap OOM-killed the pod (exit 137, 0/64 ok) before; after, peak ~195MiB and 64/64 copies succeed.

ServerSideHannes merged commit fab3774 into main Jun 30, 2026
4 checks passed

ServerSideHannes deleted the fix/govern-copy-memory branch June 30, 2026 16:24

This was referenced Jun 30, 2026

fix: hold GET memory reservation for the whole streaming-response lifetime #98

Merged

fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn) #99

Merged

fix: cap UploadPartCopy pump chunk at MAX_BUFFER_SIZE (concurrent-backup OOM) #101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: govern copy memory + fix passthrough-copy ClientResponse.read crash#97

fix: govern copy memory + fix passthrough-copy ClientResponse.read crash#97
ServerSideHannes merged 1 commit into
mainfrom
fix/govern-copy-memory

ServerSideHannes commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ServerSideHannes commented Jun 30, 2026

What

1. Govern copy memory (the OOM fix)

2. Fix the passthrough-copy crash

Proof (local repro, 256MiB cap, 48MiB governor budget)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant