Skip to content

Conversation

@jonathanc-n
Copy link
Contributor

@jonathanc-n jonathanc-n commented Sep 5, 2025

Which issue does this PR close?

This was brought up by @alamb in the SMJ spilling pull request.

Rationale for this change

Using enum to represent batch spill state is simpler than two enums

Wanted to get this out of the way before switching HJ spilling to SMJ during runtime.

What changes are included in this PR?

Use enum instead of double option for expressing the state of the buffered batch

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Sep 5, 2025
@jonathanc-n
Copy link
Contributor Author

cc @alamb @comphead

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a nice improvement to the existing code.

/// None if the batch spilled to disk th
pub batch: Option<RecordBatch>,
/// Represents in memory or spilled record batch
pub batch: BufferedBatchState,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this BufferedBatchState enum should include all intermediate data into the states, for example join_arrays. When the main batch is spilled, the join key array should either be also spilled or throw away, otherwise it will be leaking memory.

Perhaps we can leave a TODO in the comment now? This will be a big change.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty neat thanks @jonathanc-n

@comphead comphead merged commit 50e073c into apache:main Sep 5, 2025
28 checks passed
@jonathanc-n jonathanc-n deleted the change-spilling-smj-to-enum branch September 5, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants