Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom spilling handler #12287
base: branch-23.04
Are you sure you want to change the base?
Custom spilling handler #12287
Changes from 5 commits
9baf521
8df1ada
4a13c5e
4ef7d60
f109b71
5024d5a
6359bed
ac0801b
8a44dc9
c58d854
dfb0983
3bf949d
156b3f2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit odd that the spill handler would be removed when the buffer is spilled. Is that always what we want?
Also, the usage makes it clear that these spill handlers are only appropriate for device to host spilling. Should we key the
spill_handlers
dict on both the buffer and the source/target of the spill?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see why it's removed, I guess it's being added every time the cached property is accessed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this line disconcerting, but I'm having trouble articulating exactly why so I'm just going to try to explain my thought process and see if we can resolve my confusion.
At the point where we call
buf.spill
(and now the spill handler) we already know that the buffer is spillable. A buffer is spillable according to our existing logic if it hasn't been handed out to an external consumer and if there are no spill locks around it. If we get to to this point and the number of references is greater than 3, doesn't that imply that we should just spill? It seems like at that point you're in the exact same scenario that you would be for any other SpillableBuffer where a future use could try to unspill it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this is also what is done. By returning
None
, we are telling the spill manager to spill the buffer as usual.We could move this responsibility to the handler always. It would then be up to the handler to spill the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on the atomic behavior of Python built-ins has bit me in the past where the line of Python code turns out to not actually be a single atomic operation. Would probably be best to wrap this in a lock to be absolutely sure it's thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A local lock here would not help since changes to the class instance could come from anywhere. The fact that
dict
access is thread safe is relied on throughout Python including infunctools.cached_property
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
cached_property_no_spill
or something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what circumstances could this be None? Wouldn't the call to
ret = super().__get__(instance, owner)
trigger the creation of aSpillableBuffer
(assuming this decorator is only applied to functions that return something that wraps aSpillableBuffer
) which in turn would always lead to the creation of a globalSpillManager
if one isn't already initialized.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to handle the case where spilling is disabled. Notice, this decorator is also used when spilling is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the
mask
buffer of theColumn
, or - in the case of nested columns - its children?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I have simplified the code a bit. Instead of supporting any type, we now check that the instance is a
RangeIndex
. I haven't found any other use ofcached_property
where the cache can be the sole owner of a buffer, so I think we should limit the scope toRangeIndex
for now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems related to the question I asked above. I'm OK with not overengineering to support anything else, although it would be good to ensure that if we do need to support other types later it only requires adding new code and not restructuring the existing code (hence my comment above).