Skip to content

[5.0] Fix aggregate function to deal with group by sql #8113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed

[5.0] Fix aggregate function to deal with group by sql #8113

wants to merge 11 commits into from

Conversation

GrahamCampbell
Copy link
Member

Replaces #7372 and #8112.

// cc @fcjailybo, @JoostK, @codelight89

@codelight89
Copy link
Contributor

It works with count. But what about other aggregate functions (min, max, sum, avg) ?

@JoostK
Copy link
Contributor

JoostK commented Mar 23, 2015

@codelight89 That's a good point and something that needs to be addressed, I'd say. An issue is that in general we shouldn't collapse the database result per group into a single value, as that is valid in the paginator context but not necessarily in other cases.

Thus a more general solution would return array_pluck($results, 'aggregate'); if $this->groupBy was set, in which case we can compute the correct count during pagination.

@codelight89
Copy link
Contributor

Thus a more general solution would return array_pluck($results, 'aggregate'); if $this->groupBy was set, in which case we can compute the correct count during pagination.

@JoostK, I agree, this would be a more general solution.

@GrahamCampbell
Copy link
Member Author

I'm open to pulls to this branch on my fork.

@codelight89
Copy link
Contributor

May I do this?

@GrahamCampbell
Copy link
Member Author

@codelight89 Of course. :)

@JoostK
Copy link
Contributor

JoostK commented Mar 23, 2015

If you are up to the task, fine with me 👍 My array_pluck proposal was perhaps a bit too short, as the array cast and array_change_key_case is then skipped (array cast is for supporting multiple PDO fetch styles and some database systems have proven to return different capitalizations, if I remember correctly.)

@codelight89
Copy link
Contributor

@GrahamCampbell, I sent a pull request.

@JoostK, agregate function must return array only when we have more than one result, correct? On one row it returns scalar value

@@ -1562,7 +1564,7 @@ public function exists()
* Retrieve the "count" result of the query.
*
* @param string $columns
* @return int
* @return mixed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoostK Should this be array|int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it. Tell me if I'm wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other aggregate functions - mixed
Should be changed too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should. I noticed when creating the PR, as I never look at docblocks I didn't thnk about updating it earlier. The same is true for any of the other aggregate methods (and aggregate itself).

Now that I think about it, perhaps it's better to collapse grouped results to a single value in all cases, because the behaviour now depends on having set groupBy or not which is kind of unpleasant. Moreover, the actual group is not accessible so all you currently have is an array of numbers for which you do not know to which group they belong, so perhaps just aggregate them manually in PHP. I will send another PR to accomplish this so you can better see what I mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe you're right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codelight89 We should avoid mixed if we can, because it's pretty useless documentation.

@taylorotwell
Copy link
Member

Do y'all think we could add a few integration tests that actually hit the database to give this a whirl? There is an integration test class already setup we should be able to use.

@codelight89
Copy link
Contributor

@JoostK, are you sure that avg will return expected result in current realization? now it calculates avg of avg's not avg of the column.

@JoostK
Copy link
Contributor

JoostK commented Mar 25, 2015

@codelight89 What do you propose for it? Perhaps we should just keep it simple and only support count with groupings.

@codelight89
Copy link
Contributor

@JoostK, maybe we should execute aggregate functions (except count) without grouping (backup group field before executing and restore it after).

Somethig like this

This function may be modified to:

    protected function backupFields($fields)
    {
        foreach ($fields as $field)
        {
            $this->backups[$field] = $this->{$field};
            $this->{$field} = null;
        }
    }

@jarektkaczyk
Copy link
Contributor

What is the purpose of this at all? If one wants single value result, then there is literally no reason for groupBy. Otherwise you don't use Query\Builder's aggregate functions, that pluck single value, but rather get or lists to get multiple, groupped results.

Am I missing anything here?

@codelight89
Copy link
Contributor

@jarektkaczyk, first at all this is for grouped paginated request that doesn't work correct because count aggregate function doesn't work with grouped request. But I think it will be good if other aggregate functions will work correct in all cases. Am I right?

@jarektkaczyk
Copy link
Contributor

@codelight89 Fair enough with pagination. However, if we talk about other cases, I'm not sure if correct is right word. Thing is, what to expect in this case:

$query->groupBy(..)->count();

should it count all the rows (why group by then)? should it count in groups and return multiple results (why not lists/get then)?

$query->groupBy(...)->max();

the same here: max of all rows (why group by then)? groupped max (why not lists/get then)?

and so on. Show me a use-case please, if I'm wrong.

@codelight89
Copy link
Contributor

@jarektkaczyk, I understand you, it's a bad case to use it. Anyway do you think that the first row is the best option? Do you think better to leave it as it is?

@JoostK
Copy link
Contributor

JoostK commented Mar 29, 2015

@jarektkaczyk If you go through the separate commits you will see how I changed it to become like this, although I do agree with you that some of this doesn't really make sense. To answer your questions:

$query->groupBy(..)->count();

should it count all the rows (why group by then)? should it count in groups and return multiple results (why not lists/get then)?

What we're trying to solve here is the problem with pagination for grouped queries. The current solution solves this issue by counting the number of returned groups, which equals the total number of results for the query. So, count in groups is what happens in Query::aggregate and then these results are aggregated once more in Query::count. So far, so good.

$query->groupBy(...)->max();

the same here: max of all rows (why group by then)? groupped max (why not lists/get then)?

I agree with you here, the current approach makes little sense. The reasoning for it is that I didn't like returning various kinds of results (array/scalar) depending on if a group by statement was present. The problem with returning an array is that the array's values do in fact give you the results per group, but do not contain any information about the group the value belongs to, therefore being rather useless.

In retrospect, only changing Query::count and Query::aggregate as in this PR is enough to make pagination work correctly. The other aggregates don't really make sense for grouped queries, as there return value as is currently (only first group) is useless and any change to it would be just as useless (so why bother)

@jarektkaczyk
Copy link
Contributor

@JoostK Yeah, I understand the pagination issue, nothing against this one. I just don't really see a reason for changing the behaviour in any case, other than pagination (even groupped count, but not for pagination).

@JoostK
Copy link
Contributor

JoostK commented Mar 29, 2015

@jarektkaczyk How would you go about changing it only for pagination?

@jarektkaczyk
Copy link
Contributor

@JoostK I suppose this would be enough (a quick tinkering, no time for a PR now with tests, but you'll get the idea anyway):

    /**
     * Get the count of the total records for the paginator.
     *
     * @return int
     */
    public function getCountForPagination($columns = ['*'])
    {
        $this->backupFieldsForCount();

        $this->aggregate = ['function' => 'count', 'columns' => $columns];

        $results = $this->get();

        $this->aggregate = null;

        $this->restoreFieldsForCount();

        if (isset($results[0]))
        {
            return (isset($this->groups))
                ? count($results)
                : (int) array_change_key_case((array) $results[0])['aggregate'];
        }
    }

    /**
     * Backup some fields for the pagination count.
     *
     * @return void
     */
    protected function backupFieldsForCount()
    {
        foreach (['orders', 'limit', 'offset', 'columns'] as $field)
        {
            $this->backups[$field] = $this->{$field};

            $this->{$field} = null;
        }
    }

    /**
     * Restore some fields after the pagination count.
     *
     * @return void
     */
    protected function restoreFieldsForCount()
    {
        foreach (['orders', 'limit', 'offset', 'columns'] as $field)
        {
            $this->{$field} = $this->backups[$field];
        }

        $this->backups = [];
    }

@codelight89
Copy link
Contributor

    public function getCountForPagination()
    {
        $this->backupFieldsForCount();

        $this->aggregate = ['function' => 'count', 'columns' => ['*']];

        $results = $this->get();

        $this->aggregate = null;

        $this->restoreFieldsForCount();

        if (isset($results[0]))
        {
            return (isset($this->groups))
                    ? count($results)
                    : (int) array_change_key_case((array) $results[0])['aggregate'];
        }
    }

Small fix.

@jarektkaczyk
Copy link
Contributor

@codelight89 yep, good catch

... or in fact, not really :) It was there on purpose: for consistency and flexibility. Being it public method, you should be able to pass an argument, that defaults to ['*']. It doesn't change anything in the current implementations.

@JoostK
Copy link
Contributor

JoostK commented Apr 2, 2015

I would prefer if the isset($this->groups) is before the isset($results[0]), given that we are only interested in the first result for non-grouped queries.

Overall, I think I like your specific pagination counting better than this PR, as it leaves the aggregate method in itself unchanged. Perhaps we could refactor a bit to avoid the code duplication.

@jarektkaczyk
Copy link
Contributor

@JoostK It doesn't matter whether it's groupped or not if this isn't true if (isset($results[0])), so I wouldn't change the order. This is simple check that the result was in fact fetched from the db, then you're returning what should be returned.

fcjailybo and others added 2 commits April 7, 2015 13:58
Signed-off-by: Graham Campbell <graham@mineuk.com>
@codelight89
Copy link
Contributor

@GrahamCampbell later you can squash all commits if you want. But now I think better to leave them for better understanding of discussion here.
My commit reverts previous changes

@GrahamCampbell
Copy link
Member Author

later you can squash all commits

I've already squashed them as far as possible. We need to retain all authorship.

@GrahamCampbell
Copy link
Member Author

@codelight89 You need to restore the tests you deleted in the last commit please.

{
if (isset($this->groups)) return count($results);

return (int) array_change_key_case((array) $results[0])['aggregate'];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this valid syntax for php 5.4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's valid

Function array dereferencing has been added, e.g. foo()[0].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good. I couldn't remember if it was added in 5.4, or 5.5. :) I only use 5.5/5.6 on prod.

@codelight89
Copy link
Contributor

You need to restore the tests you deleted in the last commit please.

This tests were written for already deleted code

@GrahamCampbell
Copy link
Member Author

We still need tests for this, even if they're not the same tests.

@GrahamCampbell
Copy link
Member Author

Without tests, there's no evidence your changes actually work.

@codelight89
Copy link
Contributor

@GrahamCampbell I think you must rename this PR. Because now it's about fixing pagination for grouped request

@GrahamCampbell
Copy link
Member Author

I'll recreate it...

@GrahamCampbell
Copy link
Member Author

Ok then, so the docblock changes are now here: #8309.

@GrahamCampbell
Copy link
Member Author

And the rest is here: #8310.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants