-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Today when DataFusion spills files to disk, it uses the Arrow IPC format
Here is the code:
datafusion/datafusion/physical-plan/src/spill.rs
Lines 60 to 88 in 988a535
pub(crate) fn spill_record_batches( | |
batches: &[RecordBatch], | |
path: PathBuf, | |
schema: SchemaRef, | |
) -> Result<(usize, usize)> { | |
let mut writer = IPCStreamWriter::new(path.as_ref(), schema.as_ref())?; | |
for batch in batches { | |
writer.write(batch)?; | |
} | |
writer.finish()?; | |
debug!( | |
"Spilled {} batches of total {} rows to disk, memory released {}", | |
writer.num_batches, | |
writer.num_rows, | |
human_readable_size(writer.num_bytes), | |
); | |
Ok((writer.num_rows, writer.num_bytes)) | |
} | |
fn read_spill(sender: Sender<Result<RecordBatch>>, path: &Path) -> Result<()> { | |
let file = BufReader::new(File::open(path)?); | |
let reader = StreamReader::try_new(file, None)?; | |
for batch in reader { | |
sender | |
.blocking_send(batch.map_err(Into::into)) | |
.map_err(|e| exec_datafusion_err!("{e}"))?; | |
} | |
Ok(()) | |
} |
The IPC reader currently reads the spill files using file IO and into memory.
it is possible to use mmap
to zero copy the contents of the files into memory. Here is an example of how to do so:
https://github.com/apache/arrow-rs/blob/main/arrow/examples/zero_copy_ipc.rs
- My testing on Add
with_skip_validation
flag to IPCStreamReader
,FileReader
andFileDecoder
arrow-rs#7120 suggested mmap is 3x faster than file IO
Describe the solution you'd like
I would like to see if using mmap to read the spill files back in is faster
Describe alternatives you've considered
- Use mmap to read spill files
- Add / use a benchmark showing the peformance benefit of doing this
Additional context
comphead and zhuqi-lucas
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request