[PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count #16359

Dandandan · 2025-06-10T20:04:33Z

Which issue does this PR close?

Closes #.

Rationale for this change

Currently, memory is tracked based on self reporting of bytes by each individual consumer. While that often works, it will overreport the amount of memory

Whenever the child operator produces data based on calling slice on the RecordBatch / arrays. (this over-reports a lot for very large RecordBatches, i.e. those coming out of aggregates).
Whenever child arrays are re-used within the same RecordBatch (for example, two same arrays with a different name).

This PR proposes an API extension to MemoryPool and MemoryReservation that tracks Arc<dyn Array> based on their memory address.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

2010YOUY01

Thank you, this solves the memory overcounting issue across batches. I got some suggestions/questions.

This new pool implementation might cause some issue for MemoryReservation API:
If we first count the memory usage of several arrays by grow_with_arrays() interface, and then use the following functions to release (like reservation.resize(0)), memory used value will be subtracted, but the inner hash table won't be cleared

datafusion/datafusion/execution/src/memory_pool/mod.rs

Lines 308 to 314 in f35416e

    
               pub fn free(&mut self) -> usize { 
        
                   let size = self.size; 
        
                   if size != 0 { 
        
                       self.shrink(size) 
        
                   } 
        
                   size 
        
               }

datafusion/datafusion/execution/src/memory_pool/mod.rs

Lines 345 to 351 in f35416e

    
               pub fn resize(&mut self, capacity: usize) { 
        
                   match capacity.cmp(&self.size) { 
        
                       Ordering::Greater => self.grow(capacity - self.size), 
        
                       Ordering::Less => self.shrink(self.size - capacity), 
        
                       _ => {} 
        
                   } 
        
               }

This causes a potential inconsistent state, maybe we can use another semantics for the MemoryReservation API:

Memory for arrays can only be managed with grow_with_arrays()/shrink_with_arrays()
Other interfaces like shrink()/grow() will used for other memory usage

And the implementation will be like

#[derive(Debug)]
pub struct GreedyMemoryPoolWithTracking {
    pool_size: usize,
    used_others: AtomicUsize, // managed with non-array APIs
    
    used_array: AtomicUsize, // ref-counted with buffer addr stored in the hash table
    references: Mutex<HashMap<usize, usize>>,
}

2010YOUY01 · 2025-06-11T02:02:43Z

datafusion/execution/src/memory_pool/mod.rs

+    fn grow_with_arrays(
+        &self,
+        reservation: &MemoryReservation,
+        arrays: &[Arc<dyn Array>],


Can we make the API take RecordBatch instead of arrays? Since inside df it's more common to passing batches around, and we can use a utility function to do batch -> [Array] for array usages.

Thats possible, let me have a look at whether we can use recordbatch always. My corcern was we might not always have a RecordBatch, but might have an Array. In that case conversion to recordbatch would be strange.

Dandandan · 2025-06-11T05:18:33Z

Thank you, this solves the memory overcounting issue across batches. I got some suggestions/questions.

This new pool implementation might cause some issue for MemoryReservation API: If we first count the memory usage of several arrays by grow_with_arrays() interface, and then use the following functions to release (like reservation.resize(0)), memory used value will be subtracted, but the inner hash table won't be cleared

datafusion/datafusion/execution/src/memory_pool/mod.rs

Lines 308 to 314 in f35416e

pub fn free(&mut self) -> usize {

let size = self.size;

if size != 0 {

self.shrink(size)

}

size

}

datafusion/datafusion/execution/src/memory_pool/mod.rs

Lines 345 to 351 in f35416e

pub fn resize(&mut self, capacity: usize) {

match capacity.cmp(&self.size) {

Ordering::Greater => self.grow(capacity - self.size),

Ordering::Less => self.shrink(self.size - capacity),

_ => {}

}

}

This causes a potential inconsistent state, maybe we can use another semantics for the MemoryReservation API:

Memory for arrays can only be managed with grow_with_arrays()/shrink_with_arrays()

Other interfaces like shrink()/grow() will used for other memory usage

And the implementation will be like
#[derive(Debug)]
pub struct GreedyMemoryPoolWithTracking {
    pool_size: usize,
    used_others: AtomicUsize, // managed with non-array APIs
    
    used_array: AtomicUsize, // ref-counted with buffer addr stored in the hash table
    references: Mutex<HashMap<usize, usize>>,
}

Hi

Thank you, this solves the memory overcounting issue across batches. I got some suggestions/questions.

This new pool implementation might cause some issue for MemoryReservation API: If we first count the memory usage of several arrays by grow_with_arrays() interface, and then use the following functions to release (like reservation.resize(0)), memory used value will be subtracted, but the inner hash table won't be cleared

datafusion/datafusion/execution/src/memory_pool/mod.rs

Lines 308 to 314 in f35416e

pub fn free(&mut self) -> usize {

let size = self.size;

if size != 0 {

self.shrink(size)

}

size

}

datafusion/datafusion/execution/src/memory_pool/mod.rs

Lines 345 to 351 in f35416e

pub fn resize(&mut self, capacity: usize) {

match capacity.cmp(&self.size) {

Ordering::Greater => self.grow(capacity - self.size),

Ordering::Less => self.shrink(self.size - capacity),

_ => {}

}

}

This causes a potential inconsistent state, maybe we can use another semantics for the MemoryReservation API:

Memory for arrays can only be managed with grow_with_arrays()/shrink_with_arrays()

Other interfaces like shrink()/grow() will used for other memory usage

And the implementation will be like
#[derive(Debug)]
pub struct GreedyMemoryPoolWithTracking {
    pool_size: usize,
    used_others: AtomicUsize, // managed with non-array APIs
    
    used_array: AtomicUsize, // ref-counted with buffer addr stored in the hash table
    references: Mutex<HashMap<usize, usize>>,
}

Thanks for the feedback!

Yes, tracking the size seperately makes sense, I'll change that!

Add API for tracking distinct arrays

5ea8c05

github-actions bot added execution Related to the execution crate physical-plan Changes to the physical-plan crate labels Jun 10, 2025

Dandandan changed the title ~~Add API for tracking distinct arrays~~ Add API for tracking distinct arrays in MemoryPool Jun 10, 2025

Add API for tracking distinct arrays

447cc48

Dandandan changed the title ~~Add API for tracking distinct arrays in MemoryPool~~ Add API for tracking distinct arrays in MemoryPool by reference count Jun 10, 2025

Dandandan added 3 commits June 10, 2025 22:14

Add API for tracking distinct arrays

ba25081

Add API for tracking distinct arrays

05f3827

Add API for tracking distinct arrays

c419fa5

Dandandan changed the title ~~Add API for tracking distinct arrays in MemoryPool by reference count~~ [PoC] Add API for tracking distinct arrays in MemoryPool by reference count Jun 10, 2025

Dandandan added 8 commits June 10, 2025 22:33

Add API for tracking distinct arrays

da36b88

Add API for tracking distinct arrays

618e17d

Add API for tracking distinct arrays

98216a8

Add API for tracking distinct arrays

d3c4895

Add API for tracking distinct arrays

449e00f

Add API for tracking distinct arrays

9c9162c

Add API for tracking distinct arrays

5110d3f

Add API for tracking distinct arrays

da7a103

Dandandan changed the title ~~[PoC] Add API for tracking distinct arrays in MemoryPool by reference count~~ [PoC] Add API for tracking distinct buffers in MemoryPool by reference count Jun 10, 2025

Dandandan added 5 commits June 11, 2025 00:24

Add API for tracking distinct arrays

2747ca9

Add API for tracking distinct arrays

f904b56

Add API for tracking distinct arrays

f313f4c

Add API for tracking distinct arrays

75531c9

Add API for tracking distinct arrays

9d43e6b

2010YOUY01 reviewed Jun 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count #16359

[PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count #16359

Dandandan commented Jun 10, 2025 •

edited

Loading

Uh oh!

2010YOUY01 left a comment •

edited

Loading

Uh oh!

2010YOUY01 Jun 11, 2025

Uh oh!

Dandandan Jun 11, 2025

Uh oh!

Dandandan commented Jun 11, 2025

Uh oh!

Uh oh!

	pub fn free(&mut self) -> usize {
	let size = self.size;
	if size != 0 {
	self.shrink(size)
	}
	size
	}

	pub fn resize(&mut self, capacity: usize) {
	match capacity.cmp(&self.size) {
	Ordering::Greater => self.grow(capacity - self.size),
	Ordering::Less => self.shrink(self.size - capacity),
	_ => {}
	}
	}

[PoC] Add API for tracking distinct buffers in MemoryPool by reference count #16359

Are you sure you want to change the base?

[PoC] Add API for tracking distinct buffers in MemoryPool by reference count #16359

Conversation

Dandandan commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

2010YOUY01 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Dandandan Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Dandandan commented Jun 11, 2025

Uh oh!

Uh oh!

[PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count #16359

[PoC] Add API for tracking distinct buffers in `MemoryPool` by reference count #16359

Dandandan commented Jun 10, 2025 •

edited

Loading

2010YOUY01 left a comment •

edited

Loading