Skip to content

Conversation

@lidavidm
Copy link
Member

@lidavidm lidavidm commented Jun 9, 2021

This adds a slice_replace kernel mimicking Pandas's str.slice_replace. There are both ascii and UTF8 variants, indexing respectively with bytes and codepoints. The ascii variant also works on binary arrays.

@github-actions
Copy link

github-actions bot commented Jun 9, 2021

@lidavidm
Copy link
Member Author

lidavidm commented Jun 9, 2021

Needs rebasing onto #10496.

@lidavidm lidavidm marked this pull request as ready for review June 9, 2021 20:03
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks good on the principle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be called binary_replace_slice since it works on non-Ascii input as well (it just slices in byte units, not codeunits).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"binary string"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I'm not sure the default values will be picked up. Is it just for documentation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts.start can be negative, so you should perhaps use before_slice instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I added a test case for this too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit pedantic. Just output - output_start?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really useful to re-write the same tests in Python. What you could do is generate slices as in test_slice_compatibility to check Pandas compatibility.

+--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+
| utf8_lower | Unary | String-like | String-like | \(8) | |
+--------------------------+------------+-------------------------+------------------------+---------+---------------------------------------+
| utf8_replace_slice | Unary | String-like | String-like | \(2) | :struct:`ReplaceSliceOptions` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the note number should be (4)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants