Closed
Description
openedon Feb 4, 2022
cudf/python/cudf/cudf/tests/test_string.py
Line 1782 in ae36f2e
I noticed the indexing is different here in our tests. We should be asserting that the entire results are equal, not just the one value that matches the regex. While looking into that, I found that pandas returns a Series containing lists of strings, while cuDF returns a DataFrame with multiple columns if multiple matches are found. We should adopt pandas' convention.
Tested on branch-22.04, commit 4e8cb4f.
>>> test_data = ["Lion", "Monkey", "Rabbit", "Don\nkey"]
>>> ps = pd.Series(test_data)
>>> gs = cudf.Series(test_data)
>>> ps.str.findall("Monkey")
0 []
1 [Monkey]
2 []
3 []
dtype: object
>>> gs.str.findall("Monkey")
0
0 <NA>
1 Monkey
2 <NA>
3 <NA>
Originally posted by @bdice in #10208 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment