Releases: ServerSideHannes/s3proxy-python
Release list
2026.7.3
Features
-
Server-side passthrough for same-credential encrypted
COPY(#107). Scylla Manager's backup copies each SSTable to its snapshot-tagged key (CopyObject,MetadataDirective=COPY, same bucket). Every encrypted copy previously went through_copy_encrypted— GET source from upstream → decrypt → re-encrypt → PUT dest — turning a metadata-only "rename" into a full download + re-upload. Measured live this drove ~750 MB/s up and ~730 MB/s down to Hetzner (bytes_encrypted ≈ bytes_decrypted), saturating the CPU/AES-bound fleet and triggering aPutObject503 SlowDown storm that stalled the daily backup at ~16%.The encryption is not key-bound (GCM AAD is
None, the DEK is random and stored in theisecmetadata / multipart sidecar, nonces are embedded in the ciphertext), so a byte-identical copy that keeps the same wrapped-DEK metadata decrypts under any key name.handle_copy_objectnow issues a native server-sideCopyObjectwhen the directive isCOPY, the source was wrapped by the calling credential (re-key would be a no-op), and the ciphertext is ≤ 5 GiB; multipart objects also get their.metasidecar server-side-copied. Cross-credential copies,REPLACE, and >5 GiB objects still take the decrypt/re-encrypt path.Verified against real Hetzner:
CopyObjectCOPYon a 1.68 GB encrypted SSTable preservesisec/isec-kidmetadata byte-for-byte.
Chart and image published as 2026.7.3.
2026.7.2
Fixes
- frontproxy: add
option httpcloseto stop HAProxy's built-in400 Bad requeston aborted multipart uploads (#106). When a backend pod is killed mid-body (KEDA churn +option redispatch), the response is sent before the request body is drained; on a kept-alive client connection the leftover body bytes were parsed as the next request → 400. Closing per response prevents reuse of a poisoned connection.
Chart and image published as 2026.7.2.
2026.7.1
Fixes
- COPY of large multipart-encrypted objects failed with
InvalidTag(#104)._iter_multipart_plaintextdecrypted each whole client part as a single AES-GCM seal, but a client part expands into multiple internal parts, each a sequence of independent frames. Any source whose parts held more than one frame (internal parts >8MB, e.g. ScyllaDB backups) failed to copy. The reader now walks internal parts → frames and decrypts one frame at a time, matching the GET path. Also bounds copy source-read peak memory to O(frame).
2026.6.16
feat: memory debug mode (RSS vs tracked heap + top allocations) (#100)
Diagnostic to pin the s3proxy OOM root cause. Gated by S3PROXY_MEMORY_DEBUG (alias S3PROXY_TRACEMALLOC), zero overhead when unset. Every interval logs real RSS vs Python-tracked heap vs untracked gap vs governor active bytes, then the top live allocations by call site.
One dump settles which world the OOM is in:
- large untracked gap -> C-level transport buffers (uvicorn/httptools), fix at HTTP/LB layer
- small gap -> Python, top list names the exact line
Usage: extraConfig { S3PROXY_MEMORY_DEBUG: "1" } + raise pod memory to ~1-2Gi so it survives to dump; read MEMORY_DEBUG / MEMORY_DEBUG_TOP under real backup load; revert.
No behavior change unless enabled.
2026.6.15
fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn) (#99)
Stops the upload-side concurrent-backup OOM (dominant cause on 2026.6.14). uvicorn buffers each in-flight request body off the socket before the app's memory limiter runs, so a backup flood piles up bodies in the HTTP server's C-level buffers (governor reads ~64MB while RSS hits 512Mi+ -> OOMKilled). That memory is ungovernable from the app layer.
Fix: haproxy now caps in-flight requests PER pod (maxconn, default 40) and queues the excess (timeout queue) instead of overrunning a pod. Chart values: frontproxy.maxConnPerPod, frontproxy.timeouts.queue.
Verified locally at prod config (512Mi/64MB, 2026.6.14 app): direct 128x16MB PUT flood OOM-killed the pod (exit 137); via haproxy maxconn 40 -> 256/256 ok, peak 335MiB, no OOM. haproxy queues rather than rejects, so clients see success.
Completes the OOM fix set: 2026.6.13 (#97 copy), 2026.6.14 (#98 streaming-GET), 2026.6.15 (#99 upload concurrency cap).
2026.6.14
fix: hold GET memory reservation for the whole streaming-response lifetime (#98)
The dominant concurrent-backup OOM. Streaming GET responses released their memory reservation before the body was sent, so concurrent downloads ran ungoverned — each holding an 8MB decrypted frame in the send buffer (N×8MB → OOMKill, exit 137) while the limiter read ~budget.
Fix: hold the reservation for the whole stream lifetime (admission control); drop the now-redundant per-frame acquires.
Verified at prod config (512Mi/64MB): the 90-concurrent multipart GET flood that OOM-killed the pod (0/180) now completes 180/180 at ~325MiB; realistic upload+GET mix 106/106 at ~305MiB. Profiler: tracked memory 812MB→112MB, live frames 90→11.
Stacks on 2026.6.13 (#97, copy crash + copy-OOM).
2026.6.13
fix: govern copy memory + fix passthrough-copy ClientResponse.read crash (#97)
- Gate server-side copies (CopyObject / UploadPartCopy) through the memory limiter so a Scylla dedup flood can't OOM the pod (was: ungoverned concurrent decrypt+re-encrypt → exit 137).
- Fix _iter_copy_source: body.content.read(n) instead of body.read(n) (aiohttp ClientResponse.read() takes no size arg → every passthrough copy 500'd with TypeError).
Verified locally: 64-concurrent copy flood at a 256MiB cap OOM-killed the pod before (0/64 ok), now peaks ~195MiB with 64/64 ok.
2026.6.12
Make the s3proxy container's startup/liveness/readiness probes configurable via .Values (defaults unchanged). Lets a deployment raise the liveness timeout so a busy single-event-loop worker is not restarted under upload load (the kill -> retry -> crashloop cascade). App code identical to 2026.6.11.
2026.6.11
2026.6.10
Fixes
- Route V1 ListObjects to the list handler instead of raw-forwarding (#93). Completes the V1 fix from 2026.6.9.
_dispatch_bucketwas raw-forwarding any bucket GET withoutlist-type/delete/uploads/locationstraight to the backend, so a V1 ListObjects (?prefix&delimiter&max-keys&encoding-type, nolist-type=2) was sent verbatim to Hetzner → HTTP 400, never reaching the V1→V2 translation added in #92. A bucket GET whose query is only listing params now falls through to the list handler; genuine sub-resource GETs (acl,versioning, …) still forward.
This is the fix that actually unblocks Scylla backups and Postgres retention against Hetzner.
Image: ghcr.io/serversidehannes/s3proxy-python:2026.6.10
Chain: 2026.6.8 (#91 V2 token, #88 parallel HEAD) → 2026.6.9 (#92 V1→V2 in handler) → 2026.6.10 (#93 route V1 to handler).