Skip to content

use string equality instead of regexps to compare constant strings #9182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 19, 2022

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented May 16, 2022

Regexps have previously been used to encode a set of strings, but now that we have set literals we should use those instead.
And sometimes regexps seem to be used without any good reason (second example below).

Consider the below examples.

str.regexpMatch("foo|bar|baz") or // is equivalent to the below, but less efficient. 
str = ["foo", "bar", "baz"]
or
str.matches("foo") or // is equivalent to the below, but less efficient. 
str = "foo"

Rewriting from a regular expression to comparison against string constants means that there are more possible join-orders for the optimizer to choose from, so we need to run performance evaluations.


Evaluations all look good: CPP, C#, JavaScript, Java, Python, Ruby.
I had to do a bunch of retries in C++ before that turned out OK, but I think that's due to big variations in DB creation time
(DB caching in DCA 🙏).


Inspired by this comment: #5160 (reply in thread)

exists(string reg | call.getMemberName() = "matches" |
call.getNumberOfArguments() = 1 and
reg = call.getArgument(0).(String).getValue() and
not reg.regexpMatch(".*%.*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underscore is also special in match strings and it matches any character. However I also note that most cases with underscores seem to be bugs where an escaped underscore is intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I also note that most cases with underscores seem to be bugs where an escaped underscore is intended.

I found two uses of underscores, both were bugs.
I've fixed them.

@erik-krogh erik-krogh changed the title QL: add use-string-compare query use string equality instead of regexps to compare constant strings May 18, 2022
@erik-krogh erik-krogh marked this pull request as ready for review May 19, 2022 06:54
@erik-krogh erik-krogh requested review from a team as code owners May 19, 2022 06:54
@erik-krogh erik-krogh added the no-change-note-required This PR does not need a change note label May 19, 2022
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C# and Ruby 👍

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java LGTM

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ 👍

Copy link
Contributor

@esbena esbena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS 👍🏻

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python/QL 👍

@erik-krogh erik-krogh merged commit fff70da into github:main May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants