-
Notifications
You must be signed in to change notification settings - Fork 176
Support Regex for replace eval function #4456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ahkcs , thanks for the change. I just left a comment of re-using existing operator of regex replacement and you can take a look. If there is any questions, feel free to ask me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating the new UDF we should re-use the existing supported sql opperator. You can reference to my previous implementation at sed
in rex
that I was doing the same thing:
For more details you can also reference to: #4109 (comment)
REX_SED - Complete native Calcite optimization
REGEXP_REPLACE_PG_3 for basic substitution
REGEXP_REPLACE_PG_4 for flagged substitution (g, i)
REGEXP_REPLACE_5 for nth occurrence
TRANSLATE3 for transliteration (y/from/to/)
As you can see there are various of options that available for regex replace and I think you can choose the one you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically, you may need to use this:
BuiltinFunctionName.INTERNAL_REGEXP_REPLACE_3,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RyanL1997 , thanks for the suggestion, I have updated the implementation to remove the custom UDF and leverage the existing sql operator
@ahkcs , also I was wondering for the usage of the regex pattern in this function, will it support captured name group? If that is the case, there are some known limitation of java regex library. see details at #4434 (comment), and lets see if we need to call this out in the related docs. |
@RyanL1997 , thanks for calling it out, named groups like (?<domain_name>.+) are used by the rex command, which does have the Java regex limitation you mentioned (no underscores allowed in group names). The replace() function avoids this limitation entirely by using numbered references. For example:
So the Java named group limitations in #4434 don't apply to replace() function. The field naming is controlled by the eval command, bypassing Java identifier restrictions. |
Make sense, since in the case of this PR, the extraction should be done by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ahkcs , thanks for the update of the implementation and I just left some comments.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
Outdated
Show resolved
Hide resolved
...-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLStringBuiltinFunctionIT.java
Outdated
Show resolved
Hide resolved
RexLiteral literal = (RexLiteral) replacementArg; | ||
String replacement = literal.getValueAs(String.class); | ||
if (replacement != null) { | ||
// Convert sed backreferences (\1, \2) to Java style ($1, $2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because the requirement requires java style? What is the expected behavior over here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we use java regex internally but we want to support PCRE style syntax, so we add this conversion logic to translate PCRE syntax to java regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: I remember I do have the same logic for sed
, maybe we can extract them into a certain util.
core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
14b1c1a
to
7024259
Compare
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
The backport to
To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4456-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a8f08ad3e101e05aefecca4765d484d1992a12b3
# Push it to GitHub
git push --set-upstream origin backport/backport-4456-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-dev Then, create a pull request where the |
(cherry picked from commit a8f08ad)
(cherry picked from commit a8f08ad) Signed-off-by: Kai Huang <ahkcs@amazon.com>
Description
Enhances the
replace()
function to support regular expressions, enabling advanced text manipulation in PPL queries.Examples
Before (literal only):
After (full regex):