Skip to content

feat: memory debug mode (RSS vs tracked heap + top allocations)#100

Merged
ServerSideHannes merged 2 commits into
mainfrom
profile/tracemalloc-heap-dump-2
Jul 1, 2026
Merged

feat: memory debug mode (RSS vs tracked heap + top allocations)#100
ServerSideHannes merged 2 commits into
mainfrom
profile/tracemalloc-heap-dump-2

Conversation

@ServerSideHannes

@ServerSideHannes ServerSideHannes commented Jul 1, 2026

Copy link
Copy Markdown
Owner

What

A gated memory debug mode to find what actually holds resident memory under backup load — the s3proxy OOM that concurrency caps haven't resolved and that local repros can't reproduce (a pod dies at effective-6/10 while 8×52MB uploads cost only ~149MiB locally).

The defining trait of this OOM is the gap between real RSS and the Python-tracked heap: prod hit ~957MB RSS with only ~87MB tracked — the memory lives in C-level buffers (uvicorn/httptools sockets, allocator retention), which no top-allocations list can explain. So the mode logs, every interval:

MEMORY_DEBUG rss_mb=.. tracked_mb=.. untracked_mb=.. governed_active_mb=..
MEMORY_DEBUG_TOP rank=1 size_mb=.. count=.. loc=file:line
... (top 20 live Python allocations)

One dump tells us which world we're in:

  • large untracked gap → C-level transport buffers, not a Python call site (→ fix at the HTTP/LB layer).
  • small gap → it's Python, and the top list names the exact line (→ fix that code path).

Details

  • s3proxy/app.py_rss_mb() (from /proc), _dump_tracemalloc (RSS+tracked+untracked+governed), _periodic_tracemalloc, _maybe_start_tracemalloc.
  • Gated by S3PROXY_MEMORY_DEBUG (alias S3PROXY_TRACEMALLOC), zero overhead when unset. Dumps every S3PROXY_MEMORY_DEBUG_INTERVAL secs (default 15) and on SIGUSR1.
  • chartextraConfig passthrough; tests/unit/test_tracemalloc_profiling.py.

Usage

  1. extraConfig: { S3PROXY_MEMORY_DEBUG: "1" } + temporarily raise pod memory to ~2Gi (survive long enough to dump).
  2. Read MEMORY_DEBUG / MEMORY_DEBUG_TOP from logs under real backup load.
  3. Revert flag + memory, apply the real fix.

No behavior change unless the flag is set. Lint + unit tests pass (3.14).

Diagnostic to find what actually holds the resident memory under backup load
(the OOM), instead of inferring. Enabled only when S3PROXY_TRACEMALLOC is set
(zero overhead otherwise): starts tracemalloc at startup and logs the top live
Python allocations (size + call site) every S3PROXY_TRACEMALLOC_INTERVAL secs
and on SIGUSR1. Chart gains an extraConfig passthrough so one replica can set
the flag via values; revert after capture.
Turn the tracemalloc diagnostic into a proper memory debug mode. The OOM's
defining trait is the gap between real RSS (what the kernel kills on) and the
Python-tracked heap: prod hit ~957MB RSS with only ~87MB tracked, i.e. the
memory lived in C-level buffers (uvicorn/httptools sockets, allocator), which no
top-allocations list can explain. So every interval the mode now logs RSS,
tracked, untracked (rss-tracked) and the governor's active bytes side by side,
then the top live Python allocations. One dump tells us which world we're in:
- large untracked gap  -> C-level (transport buffers), not a Python call site
- small gap            -> Python, and the top list names the exact line

Gated by S3PROXY_MEMORY_DEBUG (alias S3PROXY_TRACEMALLOC), zero overhead when
unset; dumps every S3PROXY_MEMORY_DEBUG_INTERVAL secs and on SIGUSR1.
@ServerSideHannes ServerSideHannes changed the title feat: gated tracemalloc heap-dump for one-pod prod profiling feat: memory debug mode (RSS vs tracked heap + top allocations) Jul 1, 2026
@ServerSideHannes ServerSideHannes merged commit 953bcac into main Jul 1, 2026
4 checks passed
@ServerSideHannes ServerSideHannes deleted the profile/tracemalloc-heap-dump-2 branch July 1, 2026 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant