Skip to content

perf: Optimize regexp match and not match for .*foo.* cases#20610

Draft
petern48 wants to merge 14 commits intoapache:mainfrom
petern48:regexp_simplify_optim
Draft

perf: Optimize regexp match and not match for .*foo.* cases#20610
petern48 wants to merge 14 commits intoapache:mainfrom
petern48:regexp_simplify_optim

Conversation

@petern48
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Improved query performance by optimizing logical plan

What changes are included in this PR?

Added optimization rules to perform the following logic

  • s ~ '.*foo.*' -> contains(s, foo)
  • s !~ '.*foo.*' -> not(contains(s, foo))
  • s ~ '.*.*' -> is_not_null(s)
  • s !~ '.*.*' -> false

Additionally, I found that the existing optimization for s !~ .* was incorrectly converting the condition to s = '', which would return True for rows where s was empty string (''). I confirmed this is different from default non-optimized query, which returns no rows even if some are empty string or NULL.

The reasoning behind it always returning False is that .* matches the empty string so not match should not include it. Additionally, NULL aren't returned either because NULL ! '.*results inNULL`, not True. Therefore this condition is always False.

Are these changes tested?

Added tests and updated existing tests to pass after fixing the bug in the pre-existing optimization.

Are there any user-facing changes?

This is a slight behavior change due to fixing a bug in the pre-existing optimization. Previously, s !~ '.*' would return rows where s was empty string. This PR fixes the bug so that no rows are returned, which matches the behavior when no optimizations are applied.

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expr. simplification / rewrite: regex .*foo.*

1 participant