Skip to content
This repository was archived by the owner on Aug 22, 2023. It is now read-only.

PHPORM-53 Fix and test like and regex operators #17

Merged
merged 6 commits into from
Jul 27, 2023
Merged

Conversation

GromNaN
Copy link
Owner

@GromNaN GromNaN commented Jul 19, 2023

Fix PHPORM-53

  • Fix support for % and _ in like expression and escaped \% and \_
  • Keep ilike and regexp operators as aliases for like and regex
  • Allow /, # and ~ as regex delimiters
  • Add functional tests on regexp and not regexp
  • Add support for not regex

@GromNaN GromNaN requested review from jmikola and alcaeus July 19, 2023 21:19
@GromNaN GromNaN changed the title PHPORM-53 Fix and test like and regex operators PHPORM-53 Fix and test like and regex operators Jul 19, 2023
Copy link
Collaborator

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of thoughts about this, so read through the comments at your leisure.

I'm mostly concerned about the changes in QueryTest.php and how that could affect existing applications.

throw new \InvalidArgumentException(sprintf('Regular expressions must be surrounded by slashes "/". Got "%s"', $value));
}
$flags = end($e);
$regstr = substr($value, 1, -1 - strlen($flags));
Copy link
Collaborator

@jmikola jmikola Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to confirm that this handles cases where the pattern includes escaped forward slashes. If we don't currently test that perhaps we should.

I think it does, you only assert that explode() returns at least three components and only the final component is used to extra flags (which will be an empty string if something like /acme/ is passed in).

I think there may be some edge cases where parsing might accept invalid input. Given '/acme/bar/foo', this will construct a Regex with the pattern acme/bar, which seems incorrect? I'm not sure that the Laravel library should care about that, though. Probably not.


If it helps, here are some local tests using a BSON regex object where the pattern string contains slash characters.

<?php

require __DIR__ . '/vendor/autoload.php';

$client = new MongoDB\Client('mongodb://localhost:27070/?replicaSet=rs0');
$collection = $client->test->foo;
$collection->drop();

$collection->insertOne(['x' => 'foo']);
$collection->insertOne(['x' => 'foo/']);

foreach (['foo', 'foo/', 'foo\/', 'foo\\/', 'foo\\\/'] as $pattern) {
    $regex = new MongoDB\BSON\Regex($pattern);
    $cursor = $collection->find(['x' => $regex]);
    $cursor->setTypeMap(['root' => 'bson']);

    printf("Querying with: %s\n", json_encode($regex));

    foreach ($cursor as $document) {
        echo $document->toRelaxedExtendedJSON(), "\n";
    }

    echo "\n";
}

Output:

Querying with: {"$regex":"foo","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a62" }, "x" : "foo" }
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\/","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\\\/","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\\\/","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\\\\\/","$options":""}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN: I think the only possible action item from the above is:

I just want to confirm that this handles cases where the pattern includes escaped forward slashes. If we don't currently test that perhaps we should.

I'll defer to you. If you don't think any such test is necessary, feel free to resolve this thread after reading.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe any more regex parsing than absolutely necessary should be avoided. Looking at the example with /acme/bar/foo, plugging that into preg_match reveals that the first unescaped forward slash is treated as the end delimiter, resulting in b being reported as an unknown modified: https://3v4l.org/sk2fQ

Anything more than that would require us to specifically explode on unescaped delimiters. This means that simply matching for '[^\\\]' . $delimiter won't work, as the backslash itself could've been escaped, resulting once again in an unescaped delimiter. There's only so much polishing one can do on crappy inputs to make them work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugging that into preg_match reveals that the first unescaped forward slash is treated as the end delimiter

Would it make sense to use consistent parsing here? So we pick up the first delimiter character and then find the next unescaped occurrence? It's unfortunate there's no preg_unquote() implementation, but perhaps stripslashes() might suffice? (see: https://stackoverflow.com/a/9172908/162228).

If we rely on that, we're more likely to produce an exception by yielding an invalid flag string, instead of a pattern that won't match anything and could be more confusing to debug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick follow-up as I mentioned in Slack while thinking about this some more:

Although to be honest I'm not confident stripslashes() would be a fool-proof solution, as stripslashes('/foo\/bar/i') yields "/foo/bar/i" and it's still ambiguous what the user intended because we eliminated the escaping. So maybe this problem is closer to the parsing you had to do to deal with % and _ escaping for like.

Given that, I now understand @alcaeus' previous comment:

Anything more than that would require us to specifically explode on unescaped delimiters.

I think we're OK to leave this as-is, but I'd propose adding a comment somewhere around this code to note the edge cases we're choosing not to handle.

@@ -433,6 +434,62 @@ function (Builder $builder) {
->orWhereNotBetween('id', collect([3, 4])),
];

yield 'where like' => [
['find' => [['name' => new Regex('^acme$', '')], []]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the like pattern is not using any special operators, does it make sense to produce a regex instead of a direct value match? I realize ilike would still need a regex for case-insensitivity.

Both this and the where like escape test below could avoid using a regex object.

If you agree, feel free to defer this to another JIRA ticket, since it's probably just an optimization.

Copy link
Owner Author

@GromNaN GromNaN Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since like has become case-insensitive again, this optimization is no longer possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I was writing that suggestion alongside my other suggestion that we not break BC in case we went in either direction. Feel free to resolve after reading.

@GromNaN GromNaN force-pushed the PHPORM-53 branch 2 times, most recently from 654b04e to 4a491f6 Compare July 20, 2023 22:52
@GromNaN GromNaN requested a review from jmikola July 20, 2023 23:01
throw new \InvalidArgumentException(sprintf('Regular expressions must be surrounded by slashes "/". Got "%s"', $value));
if (is_string($value)) {
$delimiter = substr($value, 0, 1);
if (! in_array($delimiter, ['/', '#', '~'])) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good compromise to allow users some flexibility. Thanks 👍

I suggest storing this array in a constant property and then referencing it in the exception message below, though, which suggests that only "/" is permitted.

}
$e = explode($delimiter, $value);
if (count($e) < 3) {
throw new \InvalidArgumentException(sprintf('Regular expressions must be surrounded by delimiter "%s". Got "%s"', $delimiter, $value));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you printed $value here. By this point, we failed to find the second delimiter and cannot say with any certainty where it should have been expected. Consider:

> $value = '/foo#bar'
= "/foo#bar"

> $delimiter =  substr($value, 0, 1);
= "/"

> $e = explode($delimiter, $value);
= [
    "",
    "foo#bar",
  ]

What do you think about the following error message?

Missing expected ending delimiter "/" in "/foo#bar"

Just a suggestion to avoid must be surrounded by "%s" since the code above was changed to be more flexible about what delimiters are supported.

Copy link
Collaborator

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @alcaeus in case he'd like to take a quick pass through this.

@@ -122,6 +120,13 @@ class Builder extends BaseBuilder
'>=' => '$gte',
];

private $replacementOperators = [
'regexp' => 'regex',
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regexp and ilike operators are provided by Laravel and documented in the readme of this project.

NOTE: you can also use the Laravel regexp operations.

Keeping compatibility with Laravel Query Builder and with previous versions of this package is cheap. I'd like to step back and simply alias the operators.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep the Laravel operators as aliases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this. Being pedantic about what operators we support is likely just going to cause an inconvenience to users, especially if there's no harm in these aliases. The biggest mistake I can see is the fact that like is case-insensitive, but that's a default we cannot easily/safely change without harming users.

regex is probably the best migration path for folks using like today, and we can always revisit this down the line and carefully consider how we might deprecate like with its case-insensitivity (if ever, but probably not).

@GromNaN GromNaN force-pushed the PHPORM-53 branch 2 times, most recently from 132a709 to 22a8bce Compare July 26, 2023 12:50
Copy link
Owner Author

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of changes to keep supporting ilike and regexp.
Also fixes on double %% and __ in like value.

@@ -122,6 +120,13 @@ class Builder extends BaseBuilder
'>=' => '$gte',
];

private $replacementOperators = [
'regexp' => 'regex',
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep the Laravel operators as aliases.

$where['operator'] = $convert[$where['operator']];
// Convert aliased operators
if (isset($this->conversion[$where['operator']])) {
$where['operator'] = $this->conversion[$where['operator']];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had 2 lists of converted operators. I merged them into a single in the $conversion property.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, does this intentionally use isset() to ensure that an operator without an alias is left as-is and yields some error down the line when another query builder method realizes it isn't supported?

I'm OK with the error being reported later, but wanted to ensure that actually happens without us having to explicitly handle it. I think the explicit handling was only for actual builder methods ("integerRaw" was it?), which I don't think applies here since we're using where().

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operators like all are supported and not aliases. The isset is here to not work on them.
Checking if the operator is supported is done before in invalidOperator called by Laravel's Illuminate\Database\Query\Builder

}

if (! isset($operator) || $operator == '=') {
$query = [$column => $value];
} elseif (array_key_exists($operator, $this->conversion)) {
$query = [$column => [$this->conversion[$operator] => $value]];
Copy link
Owner Author

@GromNaN GromNaN Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion is now done earlier in compileWheres.

Copy link
Collaborator

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a small comment about escaped escape characters in front of wildcards.

@@ -122,6 +120,13 @@ class Builder extends BaseBuilder
'>=' => '$gte',
];

private $replacementOperators = [
'regexp' => 'regex',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this. Being pedantic about what operators we support is likely just going to cause an inconvenience to users, especially if there's no harm in these aliases. The biggest mistake I can see is the fact that like is case-insensitive, but that's a default we cannot easily/safely change without harming users.

regex is probably the best migration path for folks using like today, and we can always revisit this down the line and carefully consider how we might deprecate like with its case-insensitivity (if ever, but probably not).

$where['operator'] = $convert[$where['operator']];
// Convert aliased operators
if (isset($this->conversion[$where['operator']])) {
$where['operator'] = $this->conversion[$where['operator']];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, does this intentionally use isset() to ensure that an operator without an alias is left as-is and yields some error down the line when another query builder method realizes it isn't supported?

I'm OK with the error being reported later, but wanted to ensure that actually happens without us having to explicitly handle it. I think the explicit handling was only for actual builder methods ("integerRaw" was it?), which I don't think applies here since we're using where().


$operator = strtolower($operator);

if (isset($this->conversion[$operator])) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this PR didn't introduce the $conversion property, it just added to it. How is this used in the base Eloquent library?

I'm also curious because you basically recreated the entire $conversion map in this PR (assuming '=' => '=' is redundant and should be removed), so I'm unsure if it's still relevant to any parent context.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$conversion is specific to mongodb and doesn't exist in Eloquent.

Yes '=' => '=' is useless. I removed it and restored the complete operators list in $operators because the property is public. So I removed this method overload.

@GromNaN GromNaN force-pushed the PHPORM-53 branch 2 times, most recently from 56d9da8 to 47304e9 Compare July 27, 2023 16:59
@GromNaN GromNaN merged commit 2319d53 into master Jul 27, 2023
@GromNaN GromNaN deleted the PHPORM-53 branch July 27, 2023 17:03
GromNaN added a commit that referenced this pull request Aug 22, 2023
- Fix support for % and _ in like expression and escaped \% and \_
- Keep ilike and regexp operators as aliases for like and regex
- Allow /, # and ~ as regex delimiters
- Add functional tests on regexp and not regexp
- Add support for not regex
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants