-
Notifications
You must be signed in to change notification settings - Fork 3.9k
ARROW-12948: [C++][Python] Add slice_replace kernel #10494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Needs rebasing onto #10496. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this looks good on the principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be called binary_replace_slice since it works on non-Ascii input as well (it just slices in byte units, not codeunits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"binary string"?
cpp/src/arrow/compute/api_scalar.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I'm not sure the default values will be picked up. Is it just for documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opts.start can be negative, so you should perhaps use before_slice instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I added a test case for this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit pedantic. Just output - output_start?
python/pyarrow/tests/test_compute.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really useful to re-write the same tests in Python. What you could do is generate slices as in test_slice_compatibility to check Pandas compatibility.
docs/source/cpp/compute.rst
Outdated
| +--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+ | ||
| | utf8_lower | Unary | String-like | String-like | \(8) | | | ||
| +--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+ | ||
| | utf8_replace_slice | Unary | String-like | String-like | \(2) | :struct:`ReplaceSliceOptions` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the note number should be (4)?
This adds a slice_replace kernel mimicking Pandas's str.slice_replace. There are both ascii and UTF8 variants, indexing respectively with bytes and codepoints. The ascii variant also works on binary arrays.