Support `cudf-polars` `str.zfill` #19081

brandon-b-miller · 2025-06-04T01:57:35Z

Closes #19035
Closes #16480

I believe this needs pola-rs/polars#22985 to pass the one remaining failing test and the column overload described here #19035 (comment)

python/cudf_polars/cudf_polars/dsl/expressions/string.py

vyasr · 2025-06-12T01:45:09Z

I assume that this PR will updated to leverage #19090 once that is merged?

brandon-b-miller · 2025-06-24T22:02:46Z

python/cudf_polars/cudf_polars/dsl/expressions/string.py

+                col, col_width = [
+                    child.evaluate(df, context=context) for child in self.children
+                ]
+                all_gt_0 = plc.binaryop.binary_operation(


should we do this introspection?

cc @mroeschke , do we usually introspect data to error at this point? If we don't, then we risk maybe producing the wrong result in some edge cases, like a negative fill value in the zfill column overload. The scalar case is easy enough to introspect and throw, but this is a whole scan.

Is this a case where we wouldn't be able to match the result (not necessarily an error) that Polars produces?

If so, yes, I think there's some precedent on introspecting the data to raise an error. We would want to do this in _validate_input though so it can raise during translation. Plus we can do this introspection on the CPU by doing a similar option on Polars objects

The case this intends to match is the one where one of the fill elements in the column overload is negative, polars throws at runtime in this case. I suppose we want to match that behavior, but it's a performance hit to every call to the column overload :(

brandon-b-miller · 2025-07-11T18:45:31Z

Sorry @mroeschke , just getting back to this now. I've fixed up the PR which should now be passing tests, and responded to the open question above.

python/cudf_polars/cudf_polars/dsl/expressions/string.py

mroeschke · 2025-07-11T23:05:18Z

python/cudf_polars/cudf_polars/dsl/expressions/string.py

+                if width.value is not None and width.value < 0:
+                    dtypestr = dtype_str_repr(width.dtype.polars)
+                    raise InvalidOperationError(
+                        f"conversion from `{dtypestr}` to `u64` "
+                        f"failed in column 'literal' for 1 out of "
+                        f"1 values: [{width.value}]"
+                    ) from None


Could we put this in _validate_input? Raising there will enable fallback during translation.

python/cudf_polars/cudf_polars/dsl/expressions/string.py

mroeschke · 2025-07-11T23:09:42Z

Looks like py-polars/tests/unit/operations/namespaces/string/test_pad.py::test_str_zfill_unicode_not_respected may need to be added to the expected failures list in cudf/python/cudf_polars/cudf_polars/testing/plugin.py. Apparently Polars doesn't add zeros for unicode characters.

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

brandon-b-miller · 2025-07-15T13:06:14Z

Polars doesn't add zeros for unicode characters.

Should this be a polars issue? Who is right?

mroeschke · 2025-07-15T20:25:45Z

Should this be a polars issue? Who is right?

Given that Polars has a dedicated test checking that zeros aren't padded on unicode, I suppose they are "right" and ideally libcudf wouldn't pad unicode characters

brandon-b-miller · 2025-07-15T22:00:24Z

Should this be a polars issue? Who is right?

Given that Polars has a dedicated test checking that zeros aren't padded on unicode, I suppose they are "right" and ideally libcudf wouldn't pad unicode characters

It may be this piece of the docs:

This method is intended for padding numeric strings. If your data contains non-ASCII characters, use [pad_start()](https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.str.pad_start.html#polars.Series.str.pad_start) instead.

Ah. I wonder if we need to do something beyond xfailing here? It seems like in the wild users would be mostly zfilling numeric data but it's still dangerous to have a behavior divergence with the GPU backend. There aren't a lot of great options though. Off the top of my head we could

extend the libcudf zfill api to ignore unicode optionally
scan the data unnecessarily in 99% of cases to make us safe from the one edge case
Document the issue somewhere

None of the above seem particularly amazing. I'll think on what to do in lieu of any obvious ideas I missed.

mroeschke · 2025-07-15T22:13:23Z

I wonder if we need to do something beyond xfailing here?

Ah OK so I suppose the "proper" solution could use pylibcudf.strings.convert.convert_fixed_point.is_fixed_point to use with copy_if_else to used the zfilled values if the values are numeric else the old values.

brandon-b-miller · 2025-07-15T22:15:15Z

Ah OK so I suppose the "proper" solution could use pylibcudf.strings.convert.convert_fixed_point.is_fixed_point to use with copy_if_else to used the zfilled values if the values are numeric else the old values.

Thanks- this sounds like a winner 😄

brandon-b-miller · 2025-07-18T01:52:58Z

It looks like polars wants to pad strings like abc as well as numeric, so we can't use is_fixed_point. But I think I might see a way forward with all_characters_of_type, will try it out.

brandon-b-miller · 2025-07-21T15:14:56Z

after playing around with this a bit I can't think of a way out of this without throwing for the cases where the APIs diverge. AFAICT they'll only definitely agree for a certain subset of characters in the source data.

@davidwendt whats the fastest way of checking if a column of strings contains only alphanumeric characters, spaces, or empty strings?

Matt711 · 2025-07-22T12:12:10Z

python/cudf_polars/tests/expressions/test_stringfunction.py

+        ["abc", "def"],
+    ],
+)
+def test_string_zfill(fill, input_strings):


Can you run these tests with polars 1.29? You might need to version guard these tests ie if POLARS_VERSION_LT_130

Matt711 · 2025-07-22T12:17:26Z

hats the fastest way of checking if a column of strings contains only alphanumeric characters, spaces, or empty strings?

Maybe cudf::strings::contains_re and then cudf::reduce

davidwendt · 2025-07-23T17:34:09Z

... whats the fastest way of checking if a column of strings contains only alphanumeric characters, spaces, or empty strings?

I believe this API would cover everything except empty strings
https://docs.rapids.ai/api/cudf/stable/pylibcudf/api_docs/strings/char_types/#pylibcudf.strings.char_types.all_characters_of_type

brandon-b-miller · 2025-08-05T16:40:02Z

Added some validation and conditional xfailing based on the polars version here. We now guard for non-ascii characters in the simplest way I can think of. It's not ideal but I think we should move forward and reassess if perf concerns ever surface.

mroeschke · 2025-08-05T21:16:43Z

python/cudf_polars/tests/expressions/test_stringfunction.py

+        5
+        if not POLARS_VERSION_LT_130
+        else pytest.param(5, marks=pytest.mark.xfail(reason="fixed in Polars 1.30")),


Suggested change

5

if not POLARS_VERSION_LT_130

else pytest.param(5, marks=pytest.mark.xfail(reason="fixed in Polars 1.30")),

pytest.param(5, marks=pytest.mark.xfail(POLARS_VERSION_LT_130, reason="fixed in Polars 1.30")),

(For a follow up PR if interested)

mroeschke · 2025-08-05T21:17:33Z

/merge

brandon-b-miller added 2 commits May 30, 2025 13:19

impl, tests, some failing

3c0ce28

error cases, edge case, still one edge case to go

ae87f78

brandon-b-miller requested a review from a team as a code owner June 4, 2025 01:57

brandon-b-miller requested review from TomAugspurger and mroeschke June 4, 2025 01:57

github-actions bot assigned brandon-b-miller Jun 4, 2025

github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Jun 4, 2025

github-project-automation bot added this to cuDF Python Jun 4, 2025

brandon-b-miller commented Jun 4, 2025

View reviewed changes

python/cudf_polars/cudf_polars/dsl/expressions/string.py Outdated Show resolved Hide resolved

GPUtester moved this to In Progress in cuDF Python Jun 4, 2025

mroeschke reviewed Jun 4, 2025

View reviewed changes

python/cudf_polars/cudf_polars/dsl/expressions/string.py Outdated Show resolved Hide resolved

mroeschke reviewed Jun 4, 2025

View reviewed changes

python/cudf_polars/cudf_polars/dsl/expressions/string.py Outdated Show resolved Hide resolved

brandon-b-miller added 3 commits June 4, 2025 12:57

Merge branch 'branch-25.08' into fea-cudf-polars-str-zfill

d120908

dont hardcode typestr

285c108

excize pyarrow

b4b7e74

brandon-b-miller added feature request New feature or request non-breaking Non-breaking change labels Jun 4, 2025

brandon-b-miller added 5 commits June 18, 2025 11:09

merge/resolve

da8fb55

Merge branch 'branch-25.08' into fea-cudf-polars-str-zfill

7e0051c

tests, impl

ed9f3e1

scan

66c9ec9

xfail

6069e72

brandon-b-miller commented Jun 24, 2025

View reviewed changes

brandon-b-miller added 2 commits July 11, 2025 11:40

merge/fixup

c6f0465

Merge branch 'branch-25.08' into fea-cudf-polars-str-zfill

6c25f14

mroeschke reviewed Jul 11, 2025

View reviewed changes

python/cudf_polars/cudf_polars/dsl/expressions/string.py Outdated Show resolved Hide resolved

mroeschke reviewed Jul 11, 2025

View reviewed changes

python/cudf_polars/cudf_polars/dsl/expressions/string.py Show resolved Hide resolved

brandon-b-miller and others added 3 commits July 14, 2025 14:27

Merge branch 'branch-25.08' into fea-cudf-polars-str-zfill

aea2c62

Update python/cudf_polars/cudf_polars/dsl/expressions/string.py

ab8f5d9

Co-authored-by: Matthew Roeschke <10647082+mroeschke@users.noreply.github.com>

add expected failure

a555822

move scalar validation to query planning phase

7d9fba6

Merge branch 'branch-25.08' into fea-cudf-polars-str-zfill

2bdbcc9

github-actions bot assigned mroeschke Jul 15, 2025

mroeschke approved these changes Jul 15, 2025

View reviewed changes

Merge branch 'branch-25.08' into fea-cudf-polars-str-zfill

0fdb116

Matt711 reviewed Jul 22, 2025

View reviewed changes

brandon-b-miller changed the base branch from branch-25.08 to branch-25.10 July 24, 2025 13:32

brandon-b-miller added 3 commits August 5, 2025 07:25

merge/resolve

25479ef

add validation, refactor

af96c67

conditional xfail

3cbc55e

mroeschke approved these changes Aug 5, 2025

View reviewed changes

rapids-bot bot merged commit afc0d5d into rapidsai:branch-25.10 Aug 5, 2025
90 checks passed

github-project-automation bot moved this from In Progress to Done in cuDF Python Aug 5, 2025

Support cudf-polars str.zfill #19081

Support cudf-polars str.zfill #19081

Uh oh!

Conversation

brandon-b-miller commented Jun 4, 2025 • edited by mroeschke Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vyasr commented Jun 12, 2025

Uh oh!

brandon-b-miller Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jul 11, 2025

Uh oh!

Uh oh!

mroeschke Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke commented Jul 11, 2025

Uh oh!

brandon-b-miller commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mroeschke commented Jul 15, 2025

Uh oh!

brandon-b-miller commented Jul 15, 2025

Uh oh!

mroeschke commented Jul 15, 2025

Uh oh!

brandon-b-miller commented Jul 15, 2025

Uh oh!

brandon-b-miller commented Jul 18, 2025

Uh oh!

brandon-b-miller commented Jul 21, 2025

Uh oh!

Matt711 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Matt711 commented Jul 22, 2025

Uh oh!

davidwendt commented Jul 23, 2025

Uh oh!

brandon-b-miller commented Aug 5, 2025

Uh oh!

mroeschke Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Support `cudf-polars` `str.zfill` #19081

Support `cudf-polars` `str.zfill` #19081

brandon-b-miller commented Jun 4, 2025 •

edited by mroeschke

Loading

brandon-b-miller commented Jul 15, 2025 •

edited

Loading