Skip to content

Excessive memory consumption on sorting #10511

Closed
@samuelcolvin

Description

@samuelcolvin

Describe the bug

I'm running the following query:

select span_name from records order by bit_length(attributes) desc limit 20

And it's running out of memory with 20GB memory limit (RuntimeConfig::new().with_memory_limit(20 * 1024 * 1024 * 1024, 0.8)), and passing with 30GB allowed.

Error message is:

Failed to allocate additional 25887088 bytes for ExternalSorterMerge[1] with 585120448 bytes already allocated - maximum available is 23605759

The point is that in theory this query only needs to hold the span_names of the 20 records with the longest attributes in memory.

But even if it chose to hold all span_name in memory, it shouldn't need this much memory:

  • there's "only" 12_980_628 rows
  • with sum(bit_length(span_name)) = 1_038_805_400 aka ~1GB, for all rows

To Reproduce

The dataset and code aren't public, but It shouldn't be too hard to reproduce with a table containing 2 text columns

Expected behavior

Ideally a query like this would have a far more modest memory foot print.

Additional context

Using datafusion v38.0.0, same error with mimalloc and without.

For comparison, duckdb runs this query fine with a 1GB memory limit, but fails with 500MB.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions