Skip to content
This repository was archived by the owner on Aug 22, 2023. It is now read-only.

PHPORM-33: Add Query::toMql() and tests on query syntax #4

Closed
wants to merge 7 commits into from

Conversation

GromNaN
Copy link
Owner

@GromNaN GromNaN commented Jul 7, 2023

Fix PHPORM-33

This is the preliminary work before writing all the tests. It presents 2 ways of writing the tests:

  • The "integration" way: currently, the query features are tested by checking the query results. Each test have to initialise some collection data. The tests require a running server and can't run in parallel.
    We could use the driver events to track the query, but it too deep and may change with server version and maybe the connection string.
  • The "unit" way: assertion are written on the query. I created the toMql() method for that.

In this PR

The MQL format (my proposition)

MQL is a javascript-like format.

The collection name is not in the dumped format.

db.users.find().skip(5).take(10)
[
    'find' => [[], ['skip' => 5, 'limit' => 10]],
]

Or in split version:

[
    'find' => [],
    'skip' => [5],
    'take' => [10],
]

When chaining, add several calls to the array. This does not accept calls to the same method. But the query builder never use method chaining.

db.users.find({"name.family": "Smith"}).count()
[
    'find' => [['name.family' => 'Smith']],
    'count' => [],
]

@GromNaN GromNaN requested review from jmikola and alcaeus July 7, 2023 19:09
@GromNaN GromNaN force-pushed the PHPORM-33 branch 4 times, most recently from 055cfb1 to ec40454 Compare July 7, 2023 21:35
*/
public function getFresh($columns = [], $returnLazy = false)
public function toMql($columns = []): array
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want toMql to return a MQL string like db.users.find({}), I can move this code in an intermediary private method.

Copy link
Collaborator

@jmikola jmikola Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want toMql to return a MQL string like db.users.find({})

What is an "MQL string" in that case? db.users.find({}) returns a cursor, which the shell's REPL iterates automatically to display a batch of documents. I'm not sure what that has to do with MQL.

Also, I cannot for the life of me find an actual definition of MQL in the server documentation. "MongoDB Query Language (MQL)" only seems to be mentioned in some Atlas docs and blog posts.

I used to think it referred to the query document syntax but given some of the usage in the above I'm not so sure that's the case anymore.

I certainly wouldn't have expected this method's return value (operation name followed by its arguments) to be considered MQL. That said, I can see how the match syntax alone might be insufficient to convey the query behavior, since it omits options like skip and limit.

Copy link
Owner Author

@GromNaN GromNaN Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term MQL is a vague concept, it represent the calls to the high-level library. Ideally we could have an "export query for mongosh" feature in a profiler panel, but we don't have a 100% match in PHP and mongosh method signatures. That would work for the find, countDocuments, distinct, aggregate.

That said, I can see how the match syntax alone might be insufficient to convey the query behavior, since it omits options like skip and limit.

They can be formatted as find options or chained method calls.

Options:

[
    'find' => [[], ['skip' => 5, 'limit' => 10]],
]

Chained method calls:

[
    'find' => [],
    'skip' => [5],
    'take' => [10],
]

Copy link
Collaborator

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use the driver events to track the query, but it too deep and may change with server version and maybe the connection string.

In this case, we'd track outgoing events (so CommandStartedEvent) to verify the command being sent to the server. While it is possible that it changes based on the connection string, such instances are limited (e.g. adding $readPreference when sending a query to mongos). With that in mind, it would indeed be feasible to use the event system to inspect queries, but that would still require a live database server to run the tests. IMO checking query generation without a MongoDB server has some advantages, but we shouldn't limit our testing to it.

composer.json Outdated
@@ -23,7 +23,8 @@
"illuminate/container": "^10.0",
"illuminate/database": "^10.0",
"illuminate/events": "^10.0",
"mongodb/mongodb": "^1.15"
"mongodb/mongodb": "^1.15",
"ext-mongodb": "*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, but I would move this to the top of the list.

@@ -279,21 +278,12 @@ public function getFresh($columns = [], $returnLazy = false)
$aggregations = blank($this->aggregate['columns']) ? [] : $this->aggregate['columns'];

if (in_array('*', $aggregations) && $function == 'count') {
// When ORM is paginating, count doesnt need a aggregation, just a cursor operation
// When ORM is paginating, count doesn't need an aggregation, just a cursor operation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction here: it isn't a cursor operation.

For context, the legacy driver used to have a Cursor::count method, but that would send off a separate count command which can yield different results. That is the reason why this was omitted from the Cursor class in the new driver.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me, is there something to change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @alcaeus was commenting on the "just a cursor operation" phrase in this comment. The link to https://docs.mongodb.com/manual/reference/method/cursor.count/ a few lines below is related I'm sure.

Since Collection::count() is deprecated, I wonder if this should instead be using countDocuments() or estimatedDocumentCount() depending on whether a query filter is used or not, respectively.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to use countDocuments. The name estimatedDocumentCount make me feel that we shouldn't use it silently.

Copy link
Owner Author

@GromNaN GromNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imported a lot of tests from Eloquent, and fixed som bugs spotted by this tests.

Should be added as part of a new feature:

  • whereDate, whereDay, whereMonth, whereYear, whereTime
  • Operations accepting "raw" fields.
  • whereFulltext using $text indexes
  • union and unionAll using aggregations
  • subselect using aggregations
  • inRandomOrder
  • having in conjonction with groups using aggregations
  • whereExists
  • Jointures using aggregations

To be tested

  • Aggregations
  • Like

$direction = -1;
break;
default:
throw new \InvalidArgumentException('Order direction must be "asc" or "desc".');
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bugfix to throw an exception when using an incorrect value for $direction. Covered by tests imported from Illuminate\Tests\Database\DatabaseQueryBuilderTest::testOrderByInvalidDirectionParam()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we have CS setup for this project yet, but should there be a use statement for InvalidArgumentException to avoid the global reference here?

On a separate note, I think we can omit the period after an exception message if there's only one sentence/phrase. That's what is done in PHPC/PHPLIB.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is php-cs-fixer, the top-level classes doesn't require to be in use statements. We can decide to change the CS. https://jira.mongodb.org/browse/PHPORM-52

$item = new UTCDateTime($item->format('Uv'));
}
});
if (is_array($where['values'])) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bugfix to not convert Carbon\CarbonPeriod properties $startDate and $endDate into MongoDB\BSON\UTCDateTime. CarbonCarbonPeriod::getStartDate() calls $this->startDate->avoidMutation() which does not exist on the MongoDB class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, the originally array_walk_recursive() call was just wrapped in an is_array($where['values']) condition. How does that fix the bug?

It's not clear to me how the original code would have converted anything other than a DateTimeInterface into UTCDateTime.

I see that Carbon/Carbon extends DateTime (and thus implements the interface), but it's not clear to me where Carbon/CarbonPeriod comes into play or why its internal logic would be relevant during this query compilation.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that fixed the bug as described in the comment. I'll digg why the CarbonPeriod properties were updated.

@@ -997,6 +1028,10 @@ protected function compileWheres(): array
$method = "compileWhere{$where['type']}";
$result = $this->{$method}($where);

if ($not) {
$result = ['$not' => $result];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bugfix to support the $not operations! whereNot and orWhereNot.

@@ -1101,7 +1136,7 @@ protected function compileWhereIn(array $where): array
{
extract($where);

return [$column => ['$in' => array_values($values)]];
return [$column => ['$in' => Arr::flatten($values)]];
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bugfix to support nested arrays. Test imported from Illuminate\Tests\Database\DatabaseQueryBuilderTest::testBasicWhereIns

if ($values instanceof CarbonPeriod) {
$values = [new UTCDateTime($values->start->format('Uv')), new UTCDateTime($values->end->format('Uv'))];
} else {
$values = Arr::flatten($values);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugfix to accept nested arrays in whereBetween. Test imported from Illuminate\Tests\Database\DatabaseQueryBuilderTest::testWhereBetweens

}

/** @internal This method is not supported by MongoDB. */
public function whereColumn($first, $operator = null, $second = null, $boolean = 'and')
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this methods from the base Eloquent builder silently do nothing. By overloading I can add the @internal annotation so that they are not documented and appears strikethrough in good IDEs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted that these are tested in testEloquentMethodsNotSupported(). Good job.

Copy link
Collaborator

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing some feedback from an initial, incomplete pass through the PR. Some of this query builder syntax is concerning, to say the least.

*/
public function getFresh($columns = [], $returnLazy = false)
public function toMql($columns = []): array
Copy link
Collaborator

@jmikola jmikola Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want toMql to return a MQL string like db.users.find({})

What is an "MQL string" in that case? db.users.find({}) returns a cursor, which the shell's REPL iterates automatically to display a batch of documents. I'm not sure what that has to do with MQL.

Also, I cannot for the life of me find an actual definition of MQL in the server documentation. "MongoDB Query Language (MQL)" only seems to be mentioned in some Atlas docs and blog posts.

I used to think it referred to the query document syntax but given some of the usage in the above I'm not so sure that's the case anymore.

I certainly wouldn't have expected this method's return value (operation name followed by its arguments) to be considered MQL. That said, I can see how the match syntax alone might be insufficient to convey the query behavior, since it omits options like skip and limit.

@@ -279,21 +278,12 @@ public function getFresh($columns = [], $returnLazy = false)
$aggregations = blank($this->aggregate['columns']) ? [] : $this->aggregate['columns'];

if (in_array('*', $aggregations) && $function == 'count') {
// When ORM is paginating, count doesnt need a aggregation, just a cursor operation
// When ORM is paginating, count doesn't need an aggregation, just a cursor operation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @alcaeus was commenting on the "just a cursor operation" phrase in this comment. The link to https://docs.mongodb.com/manual/reference/method/cursor.count/ a few lines below is related I'm sure.

Since Collection::count() is deprecated, I wonder if this should instead be using countDocuments() or estimatedDocumentCount() depending on whether a query filter is used or not, respectively.


// Return results
return new Collection($results);
return ['aggregate' => [$pipeline, $options]];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be outside the scope of the PR, but I'm reading through the existing code above and it looks like this constructs a pipeline in a fixed order based on query builder options. I could see that being relevant for $this->groups (assuming it's mimicking a group operation from the deprecated group command) but it seems odd for $this->aggregate since there's no flexibility to construct a pipeline.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$this->aggregate can only contain function and columns. I'll try to add tests on this feature.

$builder = $this->getBuilder();
$builder->where('_id', '=', 1)->orWhereNull(['_id', 'expires_at']);
$this->assertSame(['find' => [
['$or' => [['_id' => 1], ['_id' => null], ['expires_at' => null]]],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with my comment below, this query seems unintuitive. I would have expected the orWhereNull() conditions to be grouped within an $and operator.

Comment on lines +1250 to +1256
// can accept some nested arrays as values.
$builder = $this->getBuilder();
$builder->whereIn('id', [
['issue' => 45582],
['id' => 2],
[3],
]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong. Nested arrays are actually matched by $in. Consider:

Enterprise rs0 [direct: primary] test> db.coll.insertOne({x:1})
{
  acknowledged: true,
  insertedId: ObjectId("64acb00e70dc0350a6d117ac")
}

Enterprise rs0 [direct: primary] test> db.coll.insertOne({x:[1,2,3]})
{
  acknowledged: true,
  insertedId: ObjectId("64acb01370dc0350a6d117ad")
}

Enterprise rs0 [direct: primary] test> db.coll.find({x:{$in:[1,2,3,4]}})
[
  { _id: ObjectId("64acb00e70dc0350a6d117ac"), x: 1 },
  { _id: ObjectId("64acb01370dc0350a6d117ad"), x: [ 1, 2, 3 ] }
]

Enterprise rs0 [direct: primary] test> db.coll.find({x:{$in:[[1,2,3]]}})
[ { _id: ObjectId("64acb01370dc0350a6d117ad"), x: [ 1, 2, 3 ] } ]

Enterprise rs0 [direct: primary] test> db.coll.find({x:{$in:[[1,2,3,4]]}})

In the first query, both documents match because x either is or has a value within the array [1,2,3,4].

In the second query, the [1,2,3] array within the $in array is treated as an exact match for only a single document.

The third query matches nothing, since neither document has the value [1,2,3,4] for x.


Even if array keys were to be discarded here, it seems wrong that:

[
    ['issue' => 45582],
    ['id' => 2],
    [3],
]

...would compile to the same query as the previous test:

['issue' => 45582, 'id' => 2, 3]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I won't put Arr::flatten() and leave the nested arrays.

}

/** @internal This method is not supported by MongoDB. */
public function whereColumn($first, $operator = null, $second = null, $boolean = 'and')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted that these are tested in testEloquentMethodsNotSupported(). Good job.

@@ -1147,6 +1173,12 @@ protected function compileWhereBetween(array $where): array
{
extract($where);

if ($values instanceof CarbonPeriod) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants