Releases · ServerSideHannes/s3proxy-python

Release list

2026.7.3 Latest

Latest

ServerSideHannes released this 03 Jul 08:28

2026.7.3

2b2665c

Features

Server-side passthrough for same-credential encrypted COPY (#107). Scylla Manager's backup copies each SSTable to its snapshot-tagged key (CopyObject, MetadataDirective=COPY, same bucket). Every encrypted copy previously went through _copy_encrypted — GET source from upstream → decrypt → re-encrypt → PUT dest — turning a metadata-only "rename" into a full download + re-upload. Measured live this drove ~750 MB/s up and ~730 MB/s down to Hetzner (bytes_encrypted ≈ bytes_decrypted), saturating the CPU/AES-bound fleet and triggering a PutObject 503 SlowDown storm that stalled the daily backup at ~16%.

The encryption is not key-bound (GCM AAD is None, the DEK is random and stored in the isec metadata / multipart sidecar, nonces are embedded in the ciphertext), so a byte-identical copy that keeps the same wrapped-DEK metadata decrypts under any key name. handle_copy_object now issues a native server-side CopyObject when the directive is COPY, the source was wrapped by the calling credential (re-key would be a no-op), and the ciphertext is ≤ 5 GiB; multipart objects also get their .meta sidecar server-side-copied. Cross-credential copies, REPLACE, and >5 GiB objects still take the decrypt/re-encrypt path.

Verified against real Hetzner: CopyObject COPY on a 1.68 GB encrypted SSTable preserves isec/isec-kid metadata byte-for-byte.

Chart and image published as 2026.7.3.

Assets 2

2026.7.2

ServerSideHannes released this 02 Jul 10:08

2026.7.2

7010a1f

Fixes

frontproxy: add option httpclose to stop HAProxy's built-in 400 Bad request on aborted multipart uploads (#106). When a backend pod is killed mid-body (KEDA churn + option redispatch), the response is sent before the request body is drained; on a kept-alive client connection the leftover body bytes were parsed as the next request → 400. Closing per response prevents reuse of a poisoned connection.

Chart and image published as 2026.7.2.

Assets 2

2026.7.1

ServerSideHannes released this 01 Jul 17:19

2026.7.1

bc0bd36

Fixes

COPY of large multipart-encrypted objects failed with InvalidTag (#104). _iter_multipart_plaintext decrypted each whole client part as a single AES-GCM seal, but a client part expands into multiple internal parts, each a sequence of independent frames. Any source whose parts held more than one frame (internal parts >8MB, e.g. ScyllaDB backups) failed to copy. The reader now walks internal parts → frames and decrypts one frame at a time, matching the GET path. Also bounds copy source-read peak memory to O(frame).

Assets 2

2026.6.16

ServerSideHannes released this 01 Jul 05:54

2026.6.16

953bcac

feat: memory debug mode (RSS vs tracked heap + top allocations) (#100)

Diagnostic to pin the s3proxy OOM root cause. Gated by S3PROXY_MEMORY_DEBUG (alias S3PROXY_TRACEMALLOC), zero overhead when unset. Every interval logs real RSS vs Python-tracked heap vs untracked gap vs governor active bytes, then the top live allocations by call site.

One dump settles which world the OOM is in:

large untracked gap -> C-level transport buffers (uvicorn/httptools), fix at HTTP/LB layer
small gap -> Python, top list names the exact line

Usage: extraConfig { S3PROXY_MEMORY_DEBUG: "1" } + raise pod memory to ~1-2Gi so it survives to dump; read MEMORY_DEBUG / MEMORY_DEBUG_TOP under real backup load; revert.

No behavior change unless enabled.

Assets 2

2026.6.15

ServerSideHannes released this 30 Jun 18:14

2026.6.15

98235b5

fix(chart): cap per-pod backend concurrency at the frontproxy (maxconn) (#99)

Stops the upload-side concurrent-backup OOM (dominant cause on 2026.6.14). uvicorn buffers each in-flight request body off the socket before the app's memory limiter runs, so a backup flood piles up bodies in the HTTP server's C-level buffers (governor reads ~64MB while RSS hits 512Mi+ -> OOMKilled). That memory is ungovernable from the app layer.

Fix: haproxy now caps in-flight requests PER pod (maxconn, default 40) and queues the excess (timeout queue) instead of overrunning a pod. Chart values: frontproxy.maxConnPerPod, frontproxy.timeouts.queue.

Verified locally at prod config (512Mi/64MB, 2026.6.14 app): direct 128x16MB PUT flood OOM-killed the pod (exit 137); via haproxy maxconn 40 -> 256/256 ok, peak 335MiB, no OOM. haproxy queues rather than rejects, so clients see success.

Completes the OOM fix set: 2026.6.13 (#97 copy), 2026.6.14 (#98 streaming-GET), 2026.6.15 (#99 upload concurrency cap).

Assets 2

2026.6.14

ServerSideHannes released this 30 Jun 17:33

2026.6.14

33e4d6c

fix: hold GET memory reservation for the whole streaming-response lifetime (#98)

The dominant concurrent-backup OOM. Streaming GET responses released their memory reservation before the body was sent, so concurrent downloads ran ungoverned — each holding an 8MB decrypted frame in the send buffer (N×8MB → OOMKill, exit 137) while the limiter read ~budget.

Fix: hold the reservation for the whole stream lifetime (admission control); drop the now-redundant per-frame acquires.

Verified at prod config (512Mi/64MB): the 90-concurrent multipart GET flood that OOM-killed the pod (0/180) now completes 180/180 at ~325MiB; realistic upload+GET mix 106/106 at ~305MiB. Profiler: tracked memory 812MB→112MB, live frames 90→11.

Stacks on 2026.6.13 (#97, copy crash + copy-OOM).

Assets 2

2026.6.13

ServerSideHannes released this 30 Jun 16:24

2026.6.13

fab3774

fix: govern copy memory + fix passthrough-copy ClientResponse.read crash (#97)

Gate server-side copies (CopyObject / UploadPartCopy) through the memory limiter so a Scylla dedup flood can't OOM the pod (was: ungoverned concurrent decrypt+re-encrypt → exit 137).
Fix _iter_copy_source: body.content.read(n) instead of body.read(n) (aiohttp ClientResponse.read() takes no size arg → every passthrough copy 500'd with TypeError).

Verified locally: 64-concurrent copy flood at a 256MiB cap OOM-killed the pod before (0/64 ok), now peaks ~195MiB with 64/64 ok.

Assets 2

2026.6.12

ServerSideHannes released this 30 Jun 12:13

2026.6.12

381ac2c

Make the s3proxy container's startup/liveness/readiness probes configurable via .Values (defaults unchanged). Lets a deployment raise the liveness timeout so a busy single-event-loop worker is not restarted under upload load (the kill -> retry -> crashloop cascade). App code identical to 2026.6.11.

Assets 2

2026.6.11

ServerSideHannes released this 30 Jun 11:02

2026.6.11

b2343b0

Fix: list responses emit LastModified as RFC3339 Z (millisecond) instead of +00:00. rclone 1.51.0 (scylla-manager-agent) rejected +00:00 with 'cannot parse "+00:00" as "Z"', failing every Scylla backup list. Completes the V1-list fix chain (#91 to #94).

Assets 2

2026.6.10

ServerSideHannes released this 30 Jun 10:14

2026.6.10

0cddafc

Fixes

Route V1 ListObjects to the list handler instead of raw-forwarding (#93). Completes the V1 fix from 2026.6.9. _dispatch_bucket was raw-forwarding any bucket GET without list-type/delete/uploads/location straight to the backend, so a V1 ListObjects (?prefix&delimiter&max-keys&encoding-type, no list-type=2) was sent verbatim to Hetzner → HTTP 400, never reaching the V1→V2 translation added in #92. A bucket GET whose query is only listing params now falls through to the list handler; genuine sub-resource GETs (acl, versioning, …) still forward.

This is the fix that actually unblocks Scylla backups and Postgres retention against Hetzner.

Image: ghcr.io/serversidehannes/s3proxy-python:2026.6.10

Chain: 2026.6.8 (#91 V2 token, #88 parallel HEAD) → 2026.6.9 (#92 V1→V2 in handler) → 2026.6.10 (#93 route V1 to handler).

Assets 2

Releases: ServerSideHannes/s3proxy-python

Release list

2026.7.3

Features

Uh oh!

2026.7.2

Fixes

Uh oh!

2026.7.1

Fixes

Uh oh!

2026.6.16

Uh oh!

2026.6.15

Uh oh!

2026.6.14

Uh oh!

2026.6.13

Uh oh!

2026.6.12

Uh oh!

2026.6.11

Uh oh!

2026.6.10

Fixes

Uh oh!