Skip to content

Memory leak / unbounded RSS growth in torchvision.io.image.decode_jpeg() on malformed JPEG (CPU) → potential DoS #9383

@MPSFuzz

Description

@MPSFuzz

🐛 Describe the bug

Summary

Repeated calls to torchvision.io.image.decode_jpeg() on a malformed JPEG cause near-linear RSS growth until OOM. Normal JPEGs do not show this behavior. This looks like an error-path memory leak in the CPU JPEG decode path.

I have checked past issues, #3613 ,#4378, those reports are about GPU/nvJPEG memory leaks. This report is CPU-only and leaks on the error path when decoding malformed JPEGs (RSS grows linearly even after gc + malloc_trim)

This issue mirrors a report I previously filed through the repo’s GitHub Security Advisory (private), including PoC and malformed JPEG samples. Since there has been no maintainer response for over 90 days, I’m posting a public issue to ensure the problem is visible and can be tracked.

For responsible disclosure, I will not publish the malformed JPEG samples here. I can provide them privately to maintainers, or they can review the samples already attached in the Security Advisory thread.

Reproduction
Command:
python poc.py case1.jpg --repeat 50 --mode RGB --quiet
Modes tested: UNCHANGED / RGB / GRAY (all leak to varying degrees)

PoC script:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os, sys, argparse, contextlib, gc, ctypes

os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")
os.environ.setdefault("CUDA_VISIBLE_DEVICES", "")

import torch, torchvision
from torchvision.io import ImageReadMode
from torchvision.io.image import decode_jpeg

@contextlib.contextmanager
def swallow_stderr(enable=True):
    if not enable:
        yield; return
    sys.stderr.flush()
    fd = sys.stderr.fileno()
    old = os.dup(fd)
    try:
        with open(os.devnull, "wb") as null:
            os.dup2(null.fileno(), fd)
        yield
    finally:
        os.dup2(old, fd); os.close(old)

def rss_hwm_kb():
    rss = hwm = None
    with open("/proc/self/status") as f:
        for line in f:
            if line.startswith("VmRSS:"):
                rss = int(line.split()[1])
            elif line.startswith("VmHWM:"):
                hwm = int(line.split()[1])
    return rss, hwm

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument("unit", help="the case path")
    ap.add_argument("--repeat", type=int, default=50)
    ap.add_argument("--mode", choices=["UNCHANGED","RGB","GRAY"], default="RGB")
    ap.add_argument("--quiet", action="store_true")
    args = ap.parse_args()

    print("torch:", torch.__version__)
    print("torchvision:", torchvision.__version__)
    print("cuda_available:", torch.cuda.is_available())

    with open(args.unit, "rb") as f:
        data = f.read()

    mode = {
        "UNCHANGED": ImageReadMode.UNCHANGED,
        "RGB":       ImageReadMode.RGB,
        "GRAY":      ImageReadMode.GRAY,
    }[args.mode]

    # reduce noise
    u8 = torch.frombuffer(bytearray(data), dtype=torch.uint8).contiguous()

    libc = ctypes.CDLL("libc.so.6")
    torch.set_num_threads(1)
    print(f"[repro] unit={args.unit} bytes={len(data)} repeat={args.repeat} mode={args.mode}")

    for i in range(1, args.repeat + 1):
        try:
            with swallow_stderr(args.quiet):
                _ = decode_jpeg(u8, mode=mode)
        except Exception as e:
            # Bad JPEG will come here: this is exactly where we need to verify if there is an 'error path leak'
            pass

        # Try to recycle the 'non leaking' parts as much as possible
        gc.collect()
        try:
            libc.malloc_trim(0)
        except Exception:
            pass

        rss, hwm = rss_hwm_kb()
        print(f"[{i}/{args.repeat}] VmRSS={rss/1024:.1f} MB  VmHWM={hwm/1024:.1f} MB", flush=True)

if __name__ == "__main__":
    main()

Observed results

Normal JPEG: RSS stabilizes around ~269 MB after repeated calls.

Malformed JPEG: RSS grows ~linearly to ~5 GB after 50 iterations (see logs below).

for normal case:
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu
cuda_available: False
...
[45/50] VmRSS=269.0 MB VmHWM=270.9 MB
[46/50] VmRSS=269.0 MB VmHWM=270.9 MB
[47/50] VmRSS=269.0 MB VmHWM=270.9 MB
[48/50] VmRSS=269.0 MB VmHWM=270.9 MB
[49/50] VmRSS=269.0 MB VmHWM=270.9 MB
[50/50] VmRSS=269.0 MB VmHWM=270.9 MB

for abnormal case:
torch: 2.9.0+cpu
torchvision: 0.24.0+cpu
cuda_available: False
[1/50] VmRSS=363.8 MB VmHWM=366.2 MB
[2/50] VmRSS=457.4 MB VmHWM=457.4 MB
[3/50] VmRSS=551.1 MB VmHWM=551.1 MB
[4/50] VmRSS=644.7 MB VmHWM=644.7 MB
[5/50] VmRSS=738.3 MB VmHWM=738.3 MB
[6/50] VmRSS=831.9 MB VmHWM=831.9 MB
[7/50] VmRSS=925.6 MB VmHWM=925.6 MB
...
[45/50] VmRSS=4483.3 MB VmHWM=4483.3 MB
[46/50] VmRSS=4576.9 MB VmHWM=4576.9 MB
[47/50] VmRSS=4670.6 MB VmHWM=4670.6 MB
[48/50] VmRSS=4764.2 MB VmHWM=4764.2 MB
[49/50] VmRSS=4857.8 MB VmHWM=4857.8 MB
[50/50] VmRSS=4951.4 MB VmHWM=4951.4 MB

Meanwhile, you can also check the memory usage using "htop".
For case 1, the memory usage is 5GB, and for case 2, the memory usage is over 100GB.

Sample files
I can provide the malformed samples to maintainers privately.

Impact
If a service decodes untrusted user-provided JPEGs, an attacker could repeatedly submit crafted malformed images to exhaust memory and trigger DoS.

Versions

torch: 2.9.0+cpu

torchvision: 0.24.0+cpu (0.25.0 also)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions