Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add count_matches/2, re_count_matches/2, re_scan/2 and re_named_captures/2 to Series #895

Merged
merged 3 commits into from
Apr 16, 2024

Conversation

philss
Copy link
Contributor

@philss philss commented Apr 15, 2024

This change is a step to support more operations with regexes.

The only detail here is that we cannot support re_named_captures/2 inside the context of a query, because we cannot extract the groups for the regex string, since that regex is following the rules of the backend (at the point of LazySeries, we don't have access to the backend).

Closes #353

philss added 3 commits April 15, 2024 14:16
These functions are useful to extract matches from text using regexes.

For an implementation detail we cannot support the `re_named_captures/2`
inside the context of queries: it's not possible to calculate the output
dtype.
You must write:

Explorer.DataFrame.put(df, :new_column, Explorer.Series.re_named_captures(column, ~S/(a|b)/))
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably make this work by storing the original backend as a field of the the LazySeries and then asking it to return the names. But we can do this in another PR if desired, no worries for now IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet how to capture that information in the context of the lazy frame, but I will research a bit. For now I'm going to merge this. Thanks!

@philss philss merged commit a96052d into main Apr 16, 2024
4 checks passed
@philss philss deleted the ps-add-re-string-funs-part-ii branch April 16, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex support?
2 participants