-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The basic challenge is that DataFusion can use an unbounded amount of memory for running a plan which typically results in DataFusion being killed by some system memory protection limit (e.g. the OOM Killer on Linux). See #587 for more details
As a first step towards supporting larger datasets in DataFusion, if a plan will exceed the overall budget, it should generate a runtime error (resource exhausted) rather than exceeding the budget and risking being killed
There should be a way to keep the current behavior as well (do not error due to resource exhausted)
Describe the solution you'd like
- The user can define a limit for memory via
MemoryManagerConfig - All operators that consume significant memory (Hash, Join, Sort) will properly account for and request memory from the
MemoryManagervia methods liketry_grow - If sufficient memory can not be allocated, the plan should return ResourcesExhausted
Needed:
- Use
MemoryManagerinSortExec, and return errors if the memory budget is exceeded: Add ability to disable DiskManager #4330 - Use
MemoryManagerin Aggregate operators, and return errors if the memory budget is exceeded: Throw a runtime error if the memory allocated to GroupByHash exceeds a limit #3940 - Use
MemoryManagerin Join operators, and return errors if the memory budget is exceeded #5220 - SQL level coverage for when memory limit is exceeded #4404
Describe alternatives you've considered
We can always increase the accuracy of the memory allocation accounting (e.g. RecordBatches internal to operators, etc). However, for this initial epic I would like to get the major consumers of memory instrumented and using the MemoryManager interface. Hopefully this will also allow
Additional context
cc @yjshen @crepererum
related to issues like https://github.com/influxdata/influxdb_iox/issues/5776 (and some internal issues of our own)