Description
Describe the bug
I'm running the following query:
select span_name from records order by bit_length(attributes) desc limit 20
And it's running out of memory with 20GB memory limit (RuntimeConfig::new().with_memory_limit(20 * 1024 * 1024 * 1024, 0.8)
), and passing with 30GB allowed.
Error message is:
Failed to allocate additional 25887088 bytes for ExternalSorterMerge[1] with 585120448 bytes already allocated - maximum available is 23605759
The point is that in theory this query only needs to hold the span_name
s of the 20 records with the longest attributes
in memory.
But even if it chose to hold all span_name
in memory, it shouldn't need this much memory:
- there's "only" 12_980_628 rows
- with
sum(bit_length(span_name)) = 1_038_805_400
aka ~1GB, for all rows
To Reproduce
The dataset and code aren't public, but It shouldn't be too hard to reproduce with a table containing 2 text columns
Expected behavior
Ideally a query like this would have a far more modest memory foot print.
Additional context
Using datafusion v38.0.0, same error with mimalloc and without.
For comparison, duckdb runs this query fine with a 1GB
memory limit, but fails with 500MB
.