Improve Spill Performance: `mmap` the spill files

- part of https://github.com/apache/datafusion/issues/15271
### Is your feature request related to a problem or challenge?

Today when DataFusion spills files to disk, it uses the Arrow IPC format 

Here is the code:
https://github.com/apache/datafusion/blob/988a53540b67cb36f3f259b47a68fe11736fccbb/datafusion/physical-plan/src/spill.rs#L60-L88

The IPC reader currently reads the spill files using file IO and into memory. 

it is possible to use `mmap` to zero copy the contents of the files into memory. Here is an example of how to do so:

https://github.com/apache/arrow-rs/blob/main/arrow/examples/zero_copy_ipc.rs

- My testing on https://github.com/apache/arrow-rs/pull/7120 suggested mmap is 3x faster than file IO

### Describe the solution you'd like

I would like to see if using mmap to read the spill files back in is faster

### Describe alternatives you've considered

1. Use mmap to read spill files
3. Add / use a benchmark showing the peformance benefit of doing this

### Additional context

- https://github.com/apache/datafusion/issues/15320



	pub(crate) fn spill_record_batches(
	batches: &[RecordBatch],
	path: PathBuf,
	schema: SchemaRef,
	) -> Result<(usize, usize)> {
	let mut writer = IPCStreamWriter::new(path.as_ref(), schema.as_ref())?;
	for batch in batches {
	writer.write(batch)?;
	}
	writer.finish()?;
	debug!(
	"Spilled {} batches of total {} rows to disk, memory released {}",
	writer.num_batches,
	writer.num_rows,
	human_readable_size(writer.num_bytes),
	);
	Ok((writer.num_rows, writer.num_bytes))
	}

	fn read_spill(sender: Sender<Result<RecordBatch>>, path: &Path) -> Result<()> {
	let file = BufReader::new(File::open(path)?);
	let reader = StreamReader::try_new(file, None)?;
	for batch in reader {
	sender
	.blocking_send(batch.map_err(Into::into))
	.map_err(\|e\| exec_datafusion_err!("{e}"))?;
	}
	Ok(())
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Spill Performance: `mmap` the spill files #15321

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve Spill Performance: mmap the spill files #15321

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Improve Spill Performance: `mmap` the spill files #15321