Skip to content
This repository was archived by the owner on Aug 22, 2023. It is now read-only.

PHPORM-53 Fix and test like and regex operators #17

Merged
merged 6 commits into from
Jul 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ All notable changes to this project will be documented in this file.

## [Unreleased]

- Add classes to cast ObjectId and UUID instances [#1](https://github.com/GromNaN/laravel-mongodb-private/pull/1) by [@alcaeus](https://github.com/alcaeus).
- Add classes to cast `ObjectId` and `UUID` instances [#1](https://github.com/GromNaN/laravel-mongodb-private/pull/1) by [@alcaeus](https://github.com/alcaeus).
- Add `Query\Builder::toMql()` to simplify comprehensive query tests [#6](https://github.com/GromNaN/laravel-mongodb-private/pull/6) by [@GromNaN](https://github.com/GromNaN).
- Fix `Query\Builder::whereNot` to use MongoDB [`$not`](https://www.mongodb.com/docs/manual/reference/operator/query/not/) operator [#13](https://github.com/GromNaN/laravel-mongodb-private/pull/13) by [@GromNaN](https://github.com/GromNaN).
- Fix `Query\Builder::whereBetween` to accept `Carbon\Period` object [#10](https://github.com/GromNaN/laravel-mongodb-private/pull/10) by [@GromNaN](https://github.com/GromNaN).
Expand All @@ -15,6 +15,7 @@ All notable changes to this project will be documented in this file.
- Accept operators prefixed by `$` in `Query\Builder::orWhere` [#20](https://github.com/GromNaN/laravel-mongodb-private/pull/20) by [@GromNaN](https://github.com/GromNaN).
- Remove `Query\Builder::whereAll($column, $values)`. Use `Query\Builder::where($column, 'all', $values)` instead. [#16](https://github.com/GromNaN/laravel-mongodb-private/pull/16) by [@GromNaN](https://github.com/GromNaN).
- Fix validation of unique values when the validated value is found as part of an existing value. [#21](https://github.com/GromNaN/laravel-mongodb-private/pull/21) by [@GromNaN](https://github.com/GromNaN).
- Support `%` and `_` in `like` expression [#17](https://github.com/GromNaN/laravel-mongodb-private/pull/17) by [@GromNaN](https://github.com/GromNaN).

## [3.9.2] - 2022-09-01

Expand Down
117 changes: 64 additions & 53 deletions src/Query/Builder.php
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
*/
class Builder extends BaseBuilder
{
private const REGEX_DELIMITERS = ['/', '#', '~'];

/**
* The database collection.
*
Expand Down Expand Up @@ -91,6 +93,7 @@ class Builder extends BaseBuilder
'all',
'size',
'regex',
'not regex',
'text',
'slice',
'elemmatch',
Expand All @@ -113,13 +116,22 @@ class Builder extends BaseBuilder
* @var array
*/
protected $conversion = [
'=' => '=',
'!=' => '$ne',
'<>' => '$ne',
'<' => '$lt',
'<=' => '$lte',
'>' => '$gt',
'>=' => '$gte',
'!=' => 'ne',
'<>' => 'ne',
'<' => 'lt',
'<=' => 'lte',
'>' => 'gt',
'>=' => 'gte',
'regexp' => 'regex',
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regexp and ilike operators are provided by Laravel and documented in the readme of this project.

NOTE: you can also use the Laravel regexp operations.

Keeping compatibility with Laravel Query Builder and with previous versions of this package is cheap. I'd like to step back and simply alias the operators.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep the Laravel operators as aliases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with this. Being pedantic about what operators we support is likely just going to cause an inconvenience to users, especially if there's no harm in these aliases. The biggest mistake I can see is the fact that like is case-insensitive, but that's a default we cannot easily/safely change without harming users.

regex is probably the best migration path for folks using like today, and we can always revisit this down the line and carefully consider how we might deprecate like with its case-insensitivity (if ever, but probably not).

'not regexp' => 'not regex',
'ilike' => 'like',
'elemmatch' => 'elemMatch',
'geointersects' => 'geoIntersects',
'geowithin' => 'geoWithin',
'nearsphere' => 'nearSphere',
'maxdistance' => 'maxDistance',
'centersphere' => 'centerSphere',
'uniquedocs' => 'uniqueDocs',
];

/**
Expand Down Expand Up @@ -932,20 +944,9 @@ protected function compileWheres(): array
if (isset($where['operator'])) {
$where['operator'] = strtolower($where['operator']);

// Operator conversions
$convert = [
'regexp' => 'regex',
'elemmatch' => 'elemMatch',
'geointersects' => 'geoIntersects',
'geowithin' => 'geoWithin',
'nearsphere' => 'nearSphere',
'maxdistance' => 'maxDistance',
'centersphere' => 'centerSphere',
'uniquedocs' => 'uniqueDocs',
];

if (array_key_exists($where['operator'], $convert)) {
$where['operator'] = $convert[$where['operator']];
// Convert aliased operators
if (isset($this->conversion[$where['operator']])) {
$where['operator'] = $this->conversion[$where['operator']];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had 2 lists of converted operators. I merged them into a single in the $conversion property.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, does this intentionally use isset() to ensure that an operator without an alias is left as-is and yields some error down the line when another query builder method realizes it isn't supported?

I'm OK with the error being reported later, but wanted to ensure that actually happens without us having to explicitly handle it. I think the explicit handling was only for actual builder methods ("integerRaw" was it?), which I don't think applies here since we're using where().

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operators like all are supported and not aliases. The isset is here to not work on them.
Checking if the operator is supported is done before in invalidOperator called by Laravel's Illuminate\Database\Query\Builder

}
}

Expand Down Expand Up @@ -1036,45 +1037,55 @@ protected function compileWhereBasic(array $where): array

// Replace like or not like with a Regex instance.
if (in_array($operator, ['like', 'not like'])) {
if ($operator === 'not like') {
$operator = 'not';
} else {
$operator = '=';
}

// Convert to regular expression.
$regex = preg_replace('#(^|[^\\\])%#', '$1.*', preg_quote($value));

// Convert like to regular expression.
if (! Str::startsWith($value, '%')) {
$regex = '^'.$regex;
}
if (! Str::endsWith($value, '%')) {
$regex .= '$';
}
$regex = preg_replace(
[
// Unescaped % are converted to .*
// Group consecutive %
'#(^|[^\\\])%+#',
// Unescaped _ are converted to .
// Use positive lookahead to replace consecutive _
'#(?<=^|[^\\\\])_#',
// Escaped \% or \_ are unescaped
'#\\\\\\\(%|_)#',
],
['$1.*', '$1.', '$1'],
// Escape any regex reserved characters, so they are matched
// All backslashes are converted to \\, which are needed in matching regexes.
preg_quote($value),
);
$value = new Regex('^'.$regex.'$', 'i');

// For inverse like operations, we can just use the $not operator with the Regex
$operator = $operator === 'like' ? '=' : 'not';
}

$value = new Regex($regex, 'i');
} // Manipulate regexp operations.
elseif (in_array($operator, ['regexp', 'not regexp', 'regex', 'not regex'])) {
// Manipulate regex operations.
elseif (in_array($operator, ['regex', 'not regex'])) {
// Automatically convert regular expression strings to Regex objects.
if (! $value instanceof Regex) {
$e = explode('/', $value);
$flag = end($e);
$regstr = substr($value, 1, -(strlen($flag) + 1));
$value = new Regex($regstr, $flag);
if (is_string($value)) {
// Detect the delimiter and validate the preg pattern
$delimiter = substr($value, 0, 1);
if (! in_array($delimiter, self::REGEX_DELIMITERS)) {
throw new \LogicException(sprintf('Missing expected starting delimiter in regular expression "%s", supported delimiters are: %s', $value, implode(' ', self::REGEX_DELIMITERS)));
}
$e = explode($delimiter, $value);
// We don't try to detect if the last delimiter is escaped. This would be an invalid regex.
if (count($e) < 3) {
throw new \LogicException(sprintf('Missing expected ending delimiter "%s" in regular expression "%s"', $delimiter, $value));
}
// Flags are after the last delimiter
$flags = end($e);
// Extract the regex string between the delimiters
$regstr = substr($value, 1, -1 - strlen($flags));
Copy link
Collaborator

@jmikola jmikola Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to confirm that this handles cases where the pattern includes escaped forward slashes. If we don't currently test that perhaps we should.

I think it does, you only assert that explode() returns at least three components and only the final component is used to extra flags (which will be an empty string if something like /acme/ is passed in).

I think there may be some edge cases where parsing might accept invalid input. Given '/acme/bar/foo', this will construct a Regex with the pattern acme/bar, which seems incorrect? I'm not sure that the Laravel library should care about that, though. Probably not.


If it helps, here are some local tests using a BSON regex object where the pattern string contains slash characters.

<?php

require __DIR__ . '/vendor/autoload.php';

$client = new MongoDB\Client('mongodb://localhost:27070/?replicaSet=rs0');
$collection = $client->test->foo;
$collection->drop();

$collection->insertOne(['x' => 'foo']);
$collection->insertOne(['x' => 'foo/']);

foreach (['foo', 'foo/', 'foo\/', 'foo\\/', 'foo\\\/'] as $pattern) {
    $regex = new MongoDB\BSON\Regex($pattern);
    $cursor = $collection->find(['x' => $regex]);
    $cursor->setTypeMap(['root' => 'bson']);

    printf("Querying with: %s\n", json_encode($regex));

    foreach ($cursor as $document) {
        echo $document->toRelaxedExtendedJSON(), "\n";
    }

    echo "\n";
}

Output:

Querying with: {"$regex":"foo","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a62" }, "x" : "foo" }
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\/","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\\\/","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\\\/","$options":""}
{ "_id" : { "$oid" : "64b97a01285f15fae9083a63" }, "x" : "foo/" }

Querying with: {"$regex":"foo\\\\\/","$options":""}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GromNaN: I think the only possible action item from the above is:

I just want to confirm that this handles cases where the pattern includes escaped forward slashes. If we don't currently test that perhaps we should.

I'll defer to you. If you don't think any such test is necessary, feel free to resolve this thread after reading.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe any more regex parsing than absolutely necessary should be avoided. Looking at the example with /acme/bar/foo, plugging that into preg_match reveals that the first unescaped forward slash is treated as the end delimiter, resulting in b being reported as an unknown modified: https://3v4l.org/sk2fQ

Anything more than that would require us to specifically explode on unescaped delimiters. This means that simply matching for '[^\\\]' . $delimiter won't work, as the backslash itself could've been escaped, resulting once again in an unescaped delimiter. There's only so much polishing one can do on crappy inputs to make them work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugging that into preg_match reveals that the first unescaped forward slash is treated as the end delimiter

Would it make sense to use consistent parsing here? So we pick up the first delimiter character and then find the next unescaped occurrence? It's unfortunate there's no preg_unquote() implementation, but perhaps stripslashes() might suffice? (see: https://stackoverflow.com/a/9172908/162228).

If we rely on that, we're more likely to produce an exception by yielding an invalid flag string, instead of a pattern that won't match anything and could be more confusing to debug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick follow-up as I mentioned in Slack while thinking about this some more:

Although to be honest I'm not confident stripslashes() would be a fool-proof solution, as stripslashes('/foo\/bar/i') yields "/foo/bar/i" and it's still ambiguous what the user intended because we eliminated the escaping. So maybe this problem is closer to the parsing you had to do to deal with % and _ escaping for like.

Given that, I now understand @alcaeus' previous comment:

Anything more than that would require us to specifically explode on unescaped delimiters.

I think we're OK to leave this as-is, but I'd propose adding a comment somewhere around this code to note the edge cases we're choosing not to handle.

$value = new Regex($regstr, $flags);
}

// For inverse regexp operations, we can just use the $not operator
// and pass it a Regex instence.
if (Str::startsWith($operator, 'not')) {
$operator = 'not';
}
// For inverse regex operations, we can just use the $not operator with the Regex
$operator = $operator === 'regex' ? '=' : 'not';
}

if (! isset($operator) || $operator == '=') {
$query = [$column => $value];
} elseif (array_key_exists($operator, $this->conversion)) {
$query = [$column => [$this->conversion[$operator] => $value]];
Copy link
Owner Author

@GromNaN GromNaN Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion is now done earlier in compileWheres.

} else {
$query = [$column => ['$'.$operator => $value]];
}
Expand Down Expand Up @@ -1133,7 +1144,7 @@ protected function compileWhereNull(array $where): array
*/
protected function compileWhereNotNull(array $where): array
{
$where['operator'] = '!=';
$where['operator'] = 'ne';
$where['value'] = null;

return $this->compileWhereBasic($where);
Expand Down
81 changes: 80 additions & 1 deletion tests/Query/BuilderTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
use Jenssegers\Mongodb\Query\Builder;
use Jenssegers\Mongodb\Query\Processor;
use Mockery as m;
use MongoDB\BSON\Regex;
use MongoDB\BSON\UTCDateTime;
use PHPUnit\Framework\TestCase;

Expand Down Expand Up @@ -578,6 +579,72 @@ function (Builder $builder) {
->orWhereNotBetween('id', collect([3, 4])),
];

yield 'where like' => [
['find' => [['name' => new Regex('^acme$', 'i')], []]],
fn (Builder $builder) => $builder->where('name', 'like', 'acme'),
];

yield 'where ilike' => [ // Alias for like
['find' => [['name' => new Regex('^acme$', 'i')], []]],
fn (Builder $builder) => $builder->where('name', 'ilike', 'acme'),
];

yield 'where like escape' => [
['find' => [['name' => new Regex('^\^ac\.me\$$', 'i')], []]],
fn (Builder $builder) => $builder->where('name', 'like', '^ac.me$'),
];

yield 'where like unescaped \% \_' => [
['find' => [['name' => new Regex('^a%cm_e$', 'i')], []]],
fn (Builder $builder) => $builder->where('name', 'like', 'a\%cm\_e'),
];

yield 'where like %' => [
['find' => [['name' => new Regex('^.*ac.*me.*$', 'i')], []]],
fn (Builder $builder) => $builder->where('name', 'like', '%ac%%me%'),
];

yield 'where like _' => [
['find' => [['name' => new Regex('^.ac..me.$', 'i')], []]],
fn (Builder $builder) => $builder->where('name', 'like', '_ac__me_'),
];

$regex = new Regex('^acme$', 'si');
yield 'where BSON\Regex' => [
['find' => [['name' => $regex], []]],
fn (Builder $builder) => $builder->where('name', 'regex', $regex),
];

yield 'where regexp' => [ // Alias for regex
['find' => [['name' => $regex], []]],
fn (Builder $builder) => $builder->where('name', 'regex', '/^acme$/si'),
];

yield 'where regex delimiter /' => [
['find' => [['name' => $regex], []]],
fn (Builder $builder) => $builder->where('name', 'regex', '/^acme$/si'),
];

yield 'where regex delimiter #' => [
['find' => [['name' => $regex], []]],
fn (Builder $builder) => $builder->where('name', 'regex', '#^acme$#si'),
];

yield 'where regex delimiter ~' => [
['find' => [['name' => $regex], []]],
fn (Builder $builder) => $builder->where('name', 'regex', '#^acme$#si'),
];

yield 'where regex with escaped characters' => [
['find' => [['name' => new Regex('a\.c\/m\+e', '')], []]],
fn (Builder $builder) => $builder->where('name', 'regex', '/a\.c\/m\+e/'),
];

yield 'where not regex' => [
['find' => [['name' => ['$not' => $regex]], []]],
fn (Builder $builder) => $builder->where('name', 'not regex', '/^acme$/si'),
];

/** @see DatabaseQueryBuilderTest::testBasicSelectDistinct */
yield 'distinct' => [
['distinct' => ['foo', [], []]],
Expand Down Expand Up @@ -647,7 +714,7 @@ public function testException($class, $message, \Closure $build): void

$this->expectException($class);
$this->expectExceptionMessage($message);
$build($builder);
$build($builder)->toMQL();
}

public static function provideExceptions(): iterable
Expand Down Expand Up @@ -694,6 +761,18 @@ public static function provideExceptions(): iterable
'Too few arguments to function Jenssegers\Mongodb\Query\Builder::where("foo"), 1 passed and at least 2 expected when the 1st is a string',
fn (Builder $builder) => $builder->where('foo'),
];

yield 'where regex not starting with /' => [
\LogicException::class,
'Missing expected starting delimiter in regular expression "^ac/me$", supported delimiters are: / # ~',
fn (Builder $builder) => $builder->where('name', 'regex', '^ac/me$'),
];

yield 'where regex not ending with /' => [
\LogicException::class,
'Missing expected ending delimiter "/" in regular expression "/foo#bar"',
fn (Builder $builder) => $builder->where('name', 'regex', '/foo#bar'),
];
}

/** @dataProvider getEloquentMethodsNotSupported */
Expand Down
21 changes: 21 additions & 0 deletions tests/QueryTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,21 @@ public function testAndWhere(): void
$this->assertCount(2, $users);
}

public function testRegexp(): void
{
User::create(['name' => 'Simple', 'company' => 'acme']);
User::create(['name' => 'With slash', 'company' => 'oth/er']);

$users = User::where('company', 'regexp', '/^acme$/')->get();
$this->assertCount(1, $users);

$users = User::where('company', 'regexp', '/^ACME$/i')->get();
$this->assertCount(1, $users);

$users = User::where('company', 'regexp', '/^oth\/er$/')->get();
$this->assertCount(1, $users);
}

public function testLike(): void
{
$users = User::where('name', 'like', '%doe')->get();
Expand All @@ -83,6 +98,12 @@ public function testLike(): void

$users = User::where('name', 'like', 't%')->get();
$this->assertCount(1, $users);

$users = User::where('name', 'like', 'j___ doe')->get();
$this->assertCount(2, $users);

$users = User::where('name', 'like', '_oh_ _o_')->get();
$this->assertCount(1, $users);
}

public function testNotLike(): void
Expand Down