-
Notifications
You must be signed in to change notification settings - Fork 1.8k
refactor: Use BufferedBatchState enum for SMJ spilling
#17429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is a nice improvement to the existing code.
| /// None if the batch spilled to disk th | ||
| pub batch: Option<RecordBatch>, | ||
| /// Represents in memory or spilled record batch | ||
| pub batch: BufferedBatchState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this BufferedBatchState enum should include all intermediate data into the states, for example join_arrays. When the main batch is spilled, the join key array should either be also spilled or throw away, otherwise it will be leaking memory.
Perhaps we can leave a TODO in the comment now? This will be a big change.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
comphead
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty neat thanks @jonathanc-n
Which issue does this PR close?
This was brought up by @alamb in the SMJ spilling pull request.
Rationale for this change
Using enum to represent batch spill state is simpler than two enums
Wanted to get this out of the way before switching HJ spilling to SMJ during runtime.
What changes are included in this PR?
Use enum instead of double option for expressing the state of the buffered batch