-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory_usage
when calculating nested list column
#16193
Fix memory_usage
when calculating nested list column
#16193
Conversation
@@ -926,3 +927,29 @@ def test_list_iterate_error(): | |||
def test_list_struct_list_memory_usage(): | |||
df = cudf.DataFrame({"a": [[{"b": [1]}]]}) | |||
assert df.memory_usage().sum() == 16 | |||
|
|||
|
|||
def test_empty_nested_list_uninitialized_offsets_memory_usage(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to re-write this test using a public API rather than accessing column constructors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried finding a public API (constructor) that would create an empty offset column, but I think the cudf Python layer is good at consistently ensuring the offset column has at least one entry.
There are operations that go through libcudf that will return an empty offset column (e.g. .iloc[0:0]
), but I wanted to ensure we test something that has the least chance to change if a public API implementation changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
/merge |
Description
The offset column of a nested empty list column may be empty as discussed in #16164.
ListColumn.memory_usage
assumed that this column was non-emptyUnblocks rapidsai/cuspatial#1400
Checklist