-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Labels
Description
What happens?
Successive calls to UDFs leak memory, even across connections.
Running the below reproducer will lead to an OOM.
To Reproduce
import duckdb
import gc
import resource
def get_rss_mb():
return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024
def udf_process_text(x):
return f'processed_data_{x}' * 1024
def mre_udf_leak():
con = duckdb.connect()
initial = get_rss_mb()
print(f"RSS: {initial:.0f} MB")
i=0
while i:=i+1:
with duckdb.connect() as con:
con.create_function("udf_process_text", udf_process_text, ["BIGINT"], "VARCHAR")
rows = 1000
con.execute(f"SELECT udf_process_text(range) FROM range({rows})")
rss = get_rss_mb()
if i%10 == 0:
print(f"{i:>5} {rss:>10.0f} MB")
gc.collect() # - here just to demonstrate that it doesn't matter.
if __name__ == "__main__":
mre_udf_leak()Output:
RSS: 52 MB
10 341 MB
20 520 MB
...
4140 62503 MB
4150 62503 MB
Killed
OS:
Linux/Ubuntu
DuckDB Package Version:
1.4.3
Python Version:
3.13.9
Full Name:
Paul T
Affiliation:
Iqmo
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have