Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Add beforeQuery to base query builder #37431

Merged
merged 3 commits into from
May 21, 2021
Merged

[8.x] Add beforeQuery to base query builder #37431

merged 3 commits into from
May 21, 2021

Conversation

cbl
Copy link
Contributor

@cbl cbl commented May 20, 2021

Background

Currently there is no way to change subselect-builder whereby the changes are applied to the parent query builder as well. This is because the subselect query is rendered when applied.

The Solution

Preserve query modifications that are applied when the builder gets rendered.

This pr adds a beforeQuery method to the base query builder. The preserved closures will be called before the query gets rendered. This allows keeping sub query builders alive and modifying the sub query after applying it to the parent builder:

// 1. Add the subquery
$builder->beforeQuery(function($query) use($subQuery){
    $query->joinSub($subQuery, ...);
});

// 2. Add constraints to the subquery
$subQuery->where('foo', 'bar');

// 3. Render the subquery, the constraints from 2. are applied
$builder->get();

The preserved closures will be executed in the following builder methods:

  • toSql
  • exists
  • insert
  • insertOrIgnore
  • insertGetId
  • insertUsing
  • update
  • upsert
  • delete
  • truncate

Usage

If wanted the preserved modifiers can be applied using applyBeforeQueryCallbacks

$builder->applyBeforeQueryCallbacks();

The preserved modifiers may be passed to other builders

$otherBuilder->beforeQueryCallbacks = $builder->beforeQueryCallbacks;

Where This Is Usefull

This is particularly useful for the new one-of-many feature, where performance can be improved for tables with millions of entries by limiting the groups inside the sub-queries during eager loading.

@driesvints
Copy link
Member

Oof, this seems like a very complex feature to maintain. I can already see this being mis-used or new methods forgetting to be added. This seems better off as a feature in user land?

@cbl
Copy link
Contributor Author

cbl commented May 20, 2021

I could really need this to get eager loading for one-of-many relationships with more than million entries to < 2ms. 😅

@taylorotwell
Copy link
Member

Hey @cbl - can you provide more detail on what changes this will allow you to make to One Of Many relationships to improve performance?

@cbl
Copy link
Contributor Author

cbl commented May 20, 2021

@taylorotwell When eager loading, the related models are loaded using WHERE foreign_id in (1,2,3,...), currently this constraint is applied to the parent query in one of many relationships:

SELECT *
FROM `logins`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM logins
    GROUP BY logins.user_id
) AS latest_login 
ON latest_login.id = logins.id
WHERE user_id in (1,2,3,4,5) # <---

This means that the subselect query gets MAX(id) rows for every group in the table, not only the required ones.

This could be improved by adding the constraint to the subquery:

SELECT *
FROM `logins`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM logins
    WHERE user_id in (1,2,3,4,5) # <---
    GROUP BY logins.user_id
) AS latest_login 
ON latest_login.id = logins.id
See the `EXPLAIN ANALYZE` results for both queries for more information...

Results for 4,600 users and 100,000 logins.

Filter rows on parent query:

-> Nested loop inner join  (cost=50420.32 rows=500500) (actual time=20.806..21.036 rows=5 loops=1)
    -> Index range scan on logins using logins_user_id_index, with index condition: (logins.user_id in (1,2,3,4,5))  (cost=57.51 rows=125) (actual time=0.048..0.190 rows=125 loops=1)
    -> Index lookup on latest_login using <auto_key0> (id=logins.id)  (actual time=0.001..0.001 rows=0 loops=125)
        -> Materialize  (cost=1650.35..1650.35 rows=4004) (actual time=0.167..0.167 rows=0 loops=125)
            -> Index range scan on logins using index_for_group_by(logins_user_id_index)  (cost=1249.95 rows=4004) (actual time=0.012..17.431 rows=4000 loops=1)

Filter rows on subquery:

-> Nested loop inner join  (cost=60.31 rows=125) (actual time=0.121..0.130 rows=5 loops=1)
    -> Filter: (latest_login.id is not null)  (cost=0.13..16.56 rows=125) (actual time=0.113..0.115 rows=5 loops=1)
        -> Table scan on latest_login  (cost=2.50..2.50 rows=0) (actual time=0.000..0.001 rows=5 loops=1)
            -> Materialize  (cost=2.50..2.50 rows=0) (actual time=0.112..0.114 rows=5 loops=1)
                -> Group aggregate: max(logins.id)  (actual time=0.041..0.100 rows=5 loops=1)
                    -> Filter: (logins.user_id in (1,2,3,4,5))  (cost=25.32 rows=125) (actual time=0.021..0.082 rows=125 loops=1)
                        -> Index range scan on logins using logins_user_id_index  (cost=25.32 rows=125) (actual time=0.019..0.063 rows=125 loops=1)
    -> Single-row index lookup on logins using PRIMARY (id=latest_login.id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=1 loops=5)

To achieve this we need to be able to add the constraints to the subquery after ->ofMany is called. So here preserve could be used as follows:

protected function addOneOfManyJoinSubQuery(Builder $parent, Builder $subQuery, $on)   
{
    $parent->preserve(function($parent) use ($subQuery, $on) {
        $parent->joinSub(...);   
    });
}

The subquery can be bound to a class property and edited afterwards to add the necessary constraints.

From a considerable number of rows, a difference can become noticeable that is relevant for the loading time of a page.

I figured the solution in this pr would be a simple way to achieve this and also requires little abstraction.

@taylorotwell
Copy link
Member

Can you show the entire change that will need to be made to the CanBeOneOfMany trait? Is it literally just those few lines you showed?

@cbl
Copy link
Contributor Author

cbl commented May 20, 2021

@taylorotwell This is the full diff of the CanBeOneOfMany trait for the performance improvement: cbl:one-of-many_performance. The commit also contains all changes to apply constraints to the subquery where it can improve the performance.

@taylorotwell
Copy link
Member

@cbl Do you have a big problem with me renaming this method to $query->beforeQuery(fn)?

@cbl
Copy link
Contributor Author

cbl commented May 20, 2021

@taylorotwell No, this probably gives users a quicker idea of what the function does.

@cbl cbl changed the title [8.x] Add preserve to base query builder [8.x] Add beforeQuery to base query builder May 20, 2021
@cbl
Copy link
Contributor Author

cbl commented May 20, 2021

@taylorotwell looks good, ready to be merged from my side 👍

@taylorotwell taylorotwell merged commit 8dcc5aa into laravel:8.x May 21, 2021
sunaoka added a commit to sunaoka/laravel-postgres-extension that referenced this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants