-
Notifications
You must be signed in to change notification settings - Fork 181
[Enhancement] Error handling for illegal character usage in java regex named capture group #4434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…x pattern Co-authored-by: Simeon Widdis <sawiddis@amazon.com> Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
core/src/main/java/org/opensearch/sql/expression/parse/RegexCommonUtils.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Hi @penghuo, good callout. I was actually taking a look for the Java Regex Named Group Character RulesAccording to Java's Pattern class documentation, named capture groups follow these rules: Valid Characters:
Invalid Characters (will cause PatternSyntaxException):
Java Documentation:The official Java documentation for this is in:
The exact Java rule is:Quick Test Examples: |
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
According to the above, since the limitation is coming from the upstream library we are using, I just unified the error msg to handle all the illegal character cases in extraction commands like cc @penghuo |
core/src/main/java/org/opensearch/sql/expression/parse/RegexCommonUtils.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
|
Technically we should be able to support the invalid characters by rewriting regex and map extracted values back to original name like:
This could cause name overwrap, but we can workaround by adding suffix, etc. to come up with unique name. |
|
Hi @ykmr1224 , thanks for the input. And that was an interesting approach and I was actually did a PoC of your suggestion on my local. However, I do find some limitations of this approach:
According to the above, I think we can keep as the current limitation handling approach for now, if there are some feature request coming from the users we can think about actually supporting these characters as a new feature requests. But if that day comes, I truly think the correct approach is to find a PCRE style library for regex patterns. |
|
@RyanL1997 Computation cost for regex rewrite should be ignorable since this happens only once during query analysis. And rename after extraction should be just an alias or projection (should not affect performance in my understanding). It can be separate task from this PR, btw. |
Transferring some internal communication over here - since the above issue is not blocking this change. This change is ready to be merged and I have created a issue #4549 to discover the feasibility of the above suggestion. |
…x named capture group (#4434) (#4555) (cherry picked from commit 0b7e86c) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Lantao Jin <ltjin@amazon.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Simeon Widdis <sawiddis@amazon.com> Co-authored-by: Lantao Jin <ltjin@amazon.com>
commit cba8d02 Author: Tomoyuki MORITA <moritato@amazon.com> Date: Wed Oct 15 13:08:05 2025 -0700 Add MAP_APPEND internal function to Calcite PPL (opensearch-project#4515) * Add MAP_APPEND internal function to Calcite PPL Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Rebase and fix IT issue Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> commit 3388dc7 Author: Lantao Jin <ltjin@amazon.com> Date: Thu Oct 16 01:45:29 2025 +0800 Use `_doc` + `_shard_doc` as sort tiebreaker to get better performance (opensearch-project#4569) * Use _shard_doc as sort tiebreaker Signed-off-by: Lantao Jin <ltjin@amazon.com> * _doc as a part of tie-breaker have better performance Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> commit 5630119 Author: qianheng <qianheng@amazon.com> Date: Wed Oct 15 16:40:41 2025 +0800 Fix sort push down into agg after project already pushed (opensearch-project#4546) * Fix sort push down into agg Signed-off-by: Heng Qian <qianheng@amazon.com> * Change some json files to yaml format Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> commit 1e62fba Author: Tomoyuki MORITA <moritato@amazon.com> Date: Tue Oct 14 17:20:38 2025 -0700 Fix JsonExtractAllFunctionIT failure (opensearch-project#4556) Signed-off-by: Tomoyuki Morita <moritato@amazon.com> commit 02ee33e Author: Kai Huang <105710027+ahkcs@users.noreply.github.com> Date: Tue Oct 14 14:28:53 2025 -0700 Add more examples to the `where` command doc (opensearch-project#4457) Co-authored-by: Manasvini B S <manasvis@amazon.com> commit 0b7e86c Author: Jialiang Liang <jiallian@amazon.com> Date: Tue Oct 14 10:46:01 2025 -0700 [Enhancement] Error handling for illegal character usage in java regex named capture group (opensearch-project#4434) Co-authored-by: Simeon Widdis <sawiddis@amazon.com> commit 9c97cfb Author: Tomoyuki MORITA <moritato@amazon.com> Date: Tue Oct 14 08:36:43 2025 -0700 Add JSON_EXTRACT_ALL internal function for Calcite PPL (opensearch-project#4489) * Add JSON_EXTRACT_ALL internal function for Calcite PPL Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> commit 89dbc31 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 18:24:52 2025 +0800 Check server status before starting Prometheus (opensearch-project#4537) * Check server status before starting Prometheus Signed-off-by: Lantao Jin <ltjin@amazon.com> * Change to func call Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix doc Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> commit fe62472 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 18:10:27 2025 +0800 Update request builder after pushdown sort into agg buckets (opensearch-project#4541) Signed-off-by: Lantao Jin <ltjin@amazon.com> commit 42a415f Author: qianheng <qianheng@amazon.com> Date: Tue Oct 14 17:42:45 2025 +0800 Including metadata fields type when doing agg/filter script push down (opensearch-project#4522) * Including metadata fields type when doing agg/filter script push down Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> commit 8de0386 Author: Xinyuan Lu <xinyual@amazon.com> Date: Tue Oct 14 16:41:08 2025 +0800 Fix percentile bug (opensearch-project#4539) * fix percentile bug Signed-off-by: xinyual <xinyual@amazon.com> * add IT Signed-off-by: xinyual <xinyual@amazon.com> * optimize it Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com> commit de2fdc8 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 12:29:03 2025 +0800 [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited (opensearch-project#4534) * [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix doctest Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix conflicts Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> commit 977b7ab Author: Simeon Widdis <sawiddis@gmail.com> Date: Mon Oct 13 20:23:10 2025 -0700 Update stalled action (opensearch-project#4485) commit fddbb70 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 10:23:12 2025 +0800 Add configurable sytem limitations for `subsearch` and `join` command (opensearch-project#4501) * Add configurable sytem limitations for subsearch and join command Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * remove rollback in doc Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Description
[Enhancement] Error handling for underscore usage in rex regex
Related Issue
_/-as parsed field name #3944parse/rex/replace...)Behavior after enhancement
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.