Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom spilling handler  #12287

Open
wants to merge 13 commits into
base: branch-23.04
Choose a base branch
from

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Dec 2, 2022

This PR enables objects to register a spilling handler for a specific spillable buffer. For now, this is used by RangeIndex to delete rather than spill its cached data. 

Motivation 

Normally, a RangeIndex, which consists of a start, stop, step, and a dtype, doesn’t use any device memory and spilling it should be a no-op. However, if the RangeIndex has been materialized, the spill manager might decide to spill the underlying buffer. This can degrade performance and increase memory usage significantly.  

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@madsbk madsbk added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 2, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label Dec 2, 2022
@madsbk madsbk marked this pull request as ready for review December 2, 2022 16:39
@madsbk madsbk requested a review from a team as a code owner December 2, 2022 16:39
@codecov
Copy link

codecov bot commented Dec 2, 2022

Codecov Report

Base: 85.69% // Head: 85.69% // No change to project coverage 👍

Coverage data is based on head (9a5a814) compared to base (9a5a814).
Patch has no changes to coverable lines.

❗ Current head 9a5a814 differs from pull request most recent head dfb0983. Consider uploading reports for the commit dfb0983 to get more accurate results

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-23.02   #12287   +/-   ##
=============================================
  Coverage         85.69%   85.69%           
=============================================
  Files               155      155           
  Lines             24798    24798           
=============================================
  Hits              21251    21251           
  Misses             3547     3547           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@madsbk
Copy link
Member Author

madsbk commented Dec 2, 2022 via email

@shwina
Copy link
Contributor

shwina commented Dec 2, 2022

It might be possible to do in a decorator that could replace cached_property. Would that be sufficient?

Yes, I think it would be a significant improvement

@madsbk
Copy link
Member Author

madsbk commented Dec 6, 2022

It might be possible to do in a decorator that could replace cached_property. Would that be sufficient?

Yes, I think it would be a significant improvement

I have created a decorator instead, do you think this is acceptable @shwina?

@shwina
Copy link
Contributor

shwina commented Dec 8, 2022

Thanks! I do think this is an improvement. Some questions/thoughts:

  • cached_property_delete_column_when_spilled needs the property to return a Column, but what about cached properties that possibly return buffers or arrays? Would each of those need a custom cached_property specialization that does things slightly differently with the return value?

  • Can we aim to make the changes to RangeIndex (and potentially other classes) even more minimal? For example:

    • Instead of using cached_property_delete_when_spilled, can we universally use a custom cached property in cuDF that knows about spilling? That is, everywhere that we currently use functools.cached_property, can we use cudf.utils.cached_property? Or are there situations we don't want to delete cached device objects when spilled?
    • Instead of using a decorator, can we monkeypatch methods like RangeIndex._values when spilling is enabled? This allows RangeIndex to be truly untouched.

    Admittedly I don't think either of those are fantastic suggestions; curious what you think?

@madsbk
Copy link
Member Author

madsbk commented Dec 8, 2022

cached_property_delete_column_when_spilled needs the property to return a Column, but what about cached properties that possibly return buffers or arrays? Would each of those need a custom cached_property specialization that does things slightly differently with the return value?

No, I think we can generalize the decorator to support any type. Do you know, if we are caching columns, series, arrays, or dataframes anywhere else? I am still seeing a difference between the memory usage of JIT-unspill and cudf-spilling, which could be because of caching. Note, JIT-unspill circumvent this issue because it uses .serialize() for spilling, which doesn't serialize cached data already.

Instead of using cached_property_delete_when_spilled, can we universally use a custom cached property in cuDF that knows about spilling? That is, everywhere that we currently use functools.cached_property, can we use cudf.utils.cached_property? Or are there situations we don't want to delete cached device objects when spilled?

No, this should be doable.

There might be some cases where it is a problem, I will investigate.

@madsbk
Copy link
Member Author

madsbk commented Dec 12, 2022

@shwina, I have generalized and renamed the decorator. Now, it can be used everywhere with no ill effect. 

if manager is None:
return ret

buf = ret.base_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the mask buffer of the Column, or - in the case of nested columns - its children?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.  

I have simplified the code a bit. Instead of supporting any type, we now check that the instance is a RangeIndex. I haven't found any other use of cached_property where the cache can be the sole owner of a buffer, so I think we should limit the scope to RangeIndex for now. 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems related to the question I asked above. I'm OK with not overengineering to support anything else, although it would be good to ensure that if we do need to support other types later it only requires adding new code and not restructuring the existing code (hence my comment above).

python/cudf/cudf/core/buffer/spill_manager.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/spill_manager.py Show resolved Hide resolved
s = func(*args, **kwargs)
if s is not None:
spilled += s
self._spill_handlers.pop(buf, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit odd that the spill handler would be removed when the buffer is spilled. Is that always what we want?

Also, the usage makes it clear that these spill handlers are only appropriate for device to host spilling. Should we key the spill_handlers dict on both the buffer and the source/target of the spill?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see why it's removed, I guess it's being added every time the cached property is accessed.

python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/buffer/utils.py Outdated Show resolved Hide resolved
if manager is None:
return ret

buf = ret.base_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems related to the question I asked above. I'm OK with not overengineering to support anything else, although it would be good to ensure that if we do need to support other types later it only requires adding new code and not restructuring the existing code (hence my comment above).

return nbytes


class cached_property(functools.cached_property):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use a different name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cached_property_no_spill or something.

cache_hit
or not isinstance(instance, cudf.RangeIndex)
or not isinstance(ret, cudf.core.column.NumericalColumn)
or ret.nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the nullable check needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just to avoid having to handle a mask

return ret

manager = get_global_manager()
if manager is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances could this be None? Wouldn't the call to ret = super().__get__(instance, owner) trigger the creation of a SpillableBuffer (assuming this decorator is only applied to functions that return something that wraps a SpillableBuffer) which in turn would always lead to the creation of a global SpillManager if one isn't already initialized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handle the case where spilling is disabled. Notice, this decorator is also used when spilling is disabled.  

# If `cached` is known outside of the cache, we cannot free any
# memory by clearing the cache. We know of three references:
# `instance.__dict__`, `cached`, and `sys.getrefcount`.
if sys.getrefcount(cached) > 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this line disconcerting, but I'm having trouble articulating exactly why so I'm just going to try to explain my thought process and see if we can resolve my confusion.

At the point where we call buf.spill (and now the spill handler) we already know that the buffer is spillable. A buffer is spillable according to our existing logic if it hasn't been handed out to an external consumer and if there are no spill locks around it. If we get to to this point and the number of references is greater than 3, doesn't that imply that we should just spill? It seems like at that point you're in the exact same scenario that you would be for any other SpillableBuffer where a future use could try to unspill it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get to to this point and the number of references is greater than 3, doesn't that imply that we should just spill? It seems like at that point you're in the exact same scenario that you would be for any other SpillableBuffer where a future use could try to unspill it.

Correct, this is also what is done. By returning None, we are telling the spill manager to spill the buffer as usual.
We could move this responsibility to the handler always. It would then be up to the handler to spill the buffer?

madsbk and others added 3 commits December 20, 2022 08:04
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
@shwina shwina changed the base branch from branch-23.02 to branch-23.04 January 26, 2023 16:44
@shwina
Copy link
Contributor

shwina commented Jan 26, 2023

Retargeted to 23.04

@vyasr
Copy link
Contributor

vyasr commented Jan 19, 2024

@madsbk is this still something that we want? If so, at this point should it still wait until after you've done the final merging of COW/spilling to avoid conflicting code?

@madsbk
Copy link
Member Author

madsbk commented Jan 22, 2024

@madsbk is this still something that we want? If so, at this point should it still wait until after you've done the final merging of COW/spilling to avoid conflicting code?

Yes and yes :)
The materialization of RangeIndex is something we want to avoid but let's wait until COW and spilling has been unified.

@vyasr
Copy link
Contributor

vyasr commented May 21, 2024

Note, the above unification is done now so this work is unblocked if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants