Skip to content

UDF Memory Leak #224

@paultiq

Description

@paultiq

What happens?

Successive calls to UDFs leak memory, even across connections.

Running the below reproducer will lead to an OOM.

To Reproduce

import duckdb
import gc
import resource

def get_rss_mb():
    return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024

def udf_process_text(x):
    return f'processed_data_{x}' * 1024

def mre_udf_leak():
    con = duckdb.connect()
    initial = get_rss_mb()
    print(f"RSS: {initial:.0f} MB")

    i=0
    while i:=i+1:
        with duckdb.connect() as con: 
            con.create_function("udf_process_text", udf_process_text, ["BIGINT"], "VARCHAR")

            rows = 1000
            con.execute(f"SELECT udf_process_text(range) FROM range({rows})")

            rss = get_rss_mb()

            if i%10 == 0: 
                print(f"{i:>5} {rss:>10.0f} MB")
                gc.collect()  # - here just to demonstrate that it doesn't matter. 

if __name__ == "__main__":
    mre_udf_leak()

Output:

RSS: 52 MB
10 341 MB
20 520 MB
...
4140 62503 MB
4150 62503 MB
Killed

OS:

Linux/Ubuntu

DuckDB Package Version:

1.4.3

Python Version:

3.13.9

Full Name:

Paul T

Affiliation:

Iqmo

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions