Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient counting on Criteria #882

Merged
merged 19 commits into from
May 16, 2014
Merged

Conversation

bakura10
Copy link
Member

Hi,

This is an attempt to solve this very annoying issue: http://www.doctrine-project.org/jira/browse/DDC-2217

I'm not sure about my solution as I'm not really into Doctrine internals.

Use case

I have a library that is based exclusively on Criteria API. A Paginator adapter uses an empty Criteria to count how many elements are in the collection. However, EntityRepository's matching method always initialize the whole collection into an ArrayCollection. This is unusable in most cases and make the API unusable for my use case.

Solution

I've created a new LazyCollectionCriteria that implements an efficient count. I think the cleanest solution would be to add a method to the Selectable interface: with a count method. But I think this is not possible now, unfortunately.

If the solution seems nice to you, I'll write the tests.

ping @Ocramius, @macnibblet and @Thinkscape (this should make ZfrRest usable :D)

@doctrinebot
Copy link

Hello,

thank you for creating this pull request. I have automatically opened an issue
on our Jira Bug Tracker for you. See the issue link:

http://www.doctrine-project.org/jira/browse/DDC-2865

We use Jira to track the state of pull requests and the versions they got
included in.

public function count()
{
if (null !== $this->count) {
return $this->count;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is initialized, it needs to return the count for the inner collection, as it could have been modified

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@stof
Copy link
Member

stof commented Dec 19, 2013

for the LazyCollection, I suggest adding an abstract LazyCollection class in Doctrine Common which would expect a protected method doInitialize returning a Collection and would implement all methods by initializing the collection (initialize would check the initialized flag and call doInitialize when needed) and proxying the call.
Then, each lazy collection can extend this one to implement the initialization logic and overwrite methods for which it can improve the retrieval.
This would avoid lots of duplication between the different Doctrine projects

@bakura10
Copy link
Member Author

@stof , I can do that on Doctrine Collection. (I really think it would be better to have a count method on the Selectable interface... is BC needed on this interface?).

It would be nice if we could do : $selectable->count($criteria).

The count method could be moved to the EntityRepository, instead of needing this collection.

@stof
Copy link
Member

stof commented Dec 20, 2013

BC is needed on the interface.
And I don't think it makes sense to have a countMatching() in the interface. The returned collection in matching needs to be countable anyway, so making it a lazy collection is the clean way

@bakura10
Copy link
Member Author

Understood.

About the LazyCollection on Doctrine common, your main point is reducing the amount of duplicated code, however on PersistentCollection for instance, a lot of methods have custom behavior (https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/PersistentCollection.php#L484) so actually I'm not sure you really can remove a lot of code from it.

Just tell me, and I'll do that today and write the tests.

@stof
Copy link
Member

stof commented Dec 20, 2013

Well, it would remove the code for all non-lazy methods. And each O*M project in Doctrine has its own set of persistent collections, which would benefit from it

@bakura10
Copy link
Member Author

I realized there is a EntityPersister interface that seems to be new in 2.5. May I add the getCountSQL and count methods to the interface or is it already too late?

@beberlei
Copy link
Member

beberlei commented Jan 2, 2014

This breaks the tests for the new caching support. Please fix it.

@bakura10
Copy link
Member Author

bakura10 commented Jan 2, 2014

Just updated to master, let's see if Travis passes. @beberlei , what about this: doctrine/collections#19

This implementation on ORM will be much lighter if the lazy collection on Doctrine collections is merged :)

@bakura10
Copy link
Member Author

bakura10 commented Jan 3, 2014

Tests no longer should break too.

@beberlei
Copy link
Member

beberlei commented Jan 3, 2014

@bakura10 they still fail for the same reason. Please run the tests locally, you should see the error.

PHP Fatal error:  Class Mock_EntityPersister_03131371 contains 2 abstract methods and must therefore be declared abstract or implement the remaining methods (Doctrine\ORM\Persisters\EntityPersister::getCountSQL, Doctrine\ORM\Persisters\EntityPersister::count) 

/**
* @var bool
*/
protected $initialized = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You dont need this one, if ($this->collection === null) is an equally working check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just using the same pattern as in PersistentCollection. Anyway, I think the LazyCollection on Doctrine common should really be needed, it would remove all the duplicated code :).

@bakura10
Copy link
Member Author

bakura10 commented Jan 5, 2014

Hi,

There were some issues related to the inheritance. I've fixed that and now all tests pass (at least locally).

@danizord
Copy link

👍

@marcospassos
Copy link

How about the limit/offset/orderBy cases? It doesn't make sense for counting queries.

@@ -877,6 +876,10 @@ public function matching(Criteria $criteria)

$persister = $this->em->getUnitOfWork()->getEntityPersister($this->association['targetEntity']);

return new ArrayCollection($persister->loadCriteria($criteria));
if ($this->association['fetch'] === ClassMetadataInfo::FETCH_EXTRA_LAZY) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure about this @beberlei after given it some thoughts. Doesn't make it more sense to do a check against FETCH_LAZY instead of FETCH_EXTRA_LAZY?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXTRA_LAZY seems to be more correct here. I don't see why we'd need any distinction though - can't it always be a LazyCriteriaCollection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. To me there is no reason to check this, and always wrap around a lazy collection. The performance penalty is pretty low I suppose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving it a thought, I've changed that to return a lazy collection IF the association fetch is not EAGER. This makes much more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we look at it, a PersistentColleciton is already filtered by the owner. When fetch mode is LAZY, we expect it to load it all anyway if we do a $article->getComments() for instance. The only fetch mode where we expect it to be sliced is actually EXTRA_LAZY.

The thing is that if we do it for LAZY, it will usually trigger two queries (the count and the actual requests to fetch the data), which I suppose is not what we want.

@bakura10
Copy link
Member Author

bakura10 commented Feb 2, 2014

@marcospassos , it makes sense to add "orderBy" to the count query, indeed. I'll add this later, however I'm not sure to understand limit/offset. This one does not make sense isn't it?

@marcospassos
Copy link

@bakura10 why ORDER BY makes sense? Neither of them makes sense. My point is that just ignoring these criteria could bring some unexpected results, conceptually speaking. For example, counting records using LIMIT and OFFSET using MySQL is not supported and will result in NULL, but may sounds strange a criteria with a limit up to 10 records return a counting result greater than the limit itself.

@bakura10
Copy link
Member Author

bakura10 commented Feb 3, 2014

Current implementation does not take into account neither limit, group by or offset, so that's why I didn't understand your feedback ^^.

Sorry for the ORDER BY, yes it does not make sense. I read "group by" instead of "order by" :o.

@marcospassos
Copy link

I think it should be clear, at least in the documentation 👍

@bakura10
Copy link
Member Author

bakura10 commented Feb 5, 2014

Hi,

The last commit reuse the new abstract lazy collection from Doctrine Collections 1.2. This PR should be okey now :).

@guilhermeblanco guilhermeblanco merged commit 92a2b01 into doctrine:master May 16, 2014
@guilhermeblanco
Copy link
Member

@bakura10 I merged a slightly modified version of this PR.
I added cached count results support for LazyCollection, preventing multiple calls to count() to hammer DB. Once collection is initialized, then the proper count gets returned.

@bakura10
Copy link
Member Author

Awesome :D. Thanks !

However about caching the count, I'm not sure, as the criteria could be changed between two consecutive count calls (as it's passed by reference). It can be problematic of people use the lazycollectionCriteria themselves.

Envoyé de mon iPhone

Le 16 mai 2014 à 06:24, Guilherme Blanco notifications@github.com a écrit :

@bakura10 I merged a slightly modified version of this PR.
I added cached count results support for LazyCollection, preventing multiple calls to count() to hammer DB. Once collection is initialized, then the proper count gets returned.


Reply to this email directly or view it on GitHub.

@bakura10
Copy link
Member Author

Hi @guilhermeblanco ,

First, thank you for having merged this. I've tried it in my code, it really works well and push the Criteria to the next level. However, I think we should have a discussion about this: https://github.com/doctrine/doctrine2/pull/882/files#diff-108586f774fc8acb02163ed709e05e86R886

Let's imagine the following code:

class User
{
    /**
     * @ORM\OneToMany(targetEntity="Message", mappedBy="conversation")
     */
    protected $messages;
}

and

class Message
{
    /**
     * @ORM\ManyToOne(targetEntity="User", inversedBy="messages")
     */
    protected $user;
}

With current configuration, count($user->getMessages()->matching(new Criteria())) will actually trigger a load of the whole collection (no wrap around a LazyCriteriaCollection). For the optimized COUNT to occur, we need to set the fetch mode to EXTRA_LAZY.

I fully understand why @beberlei wanted to do this. Semantically, LAZY means "retrieve the whole collection when we first access it", so if we assume this, it does not make any sense to wrap around the LazyCriteriaCollection.

However, from a usage point of view, I realized I actually often "forget" to set it to EXTRA_LAZY (because I use paginates that abstract that kind of things), so in all those cases, it does a full load of the table.

My point is that maybe we should just allow to wrap around a LazyCriteria in all cases (LAZY and EXTRA_LAZY). After all, the overhead introduced by the LazyCollectionCriteria is quite small.

What do you think?

ping @Ocramius @macnibblet

@Ocramius
Copy link
Member

@bakura10 totally for it, but only when it's EXTRA_LAZY

@bakura10
Copy link
Member Author

So you agree with the current implementation?

Envoyé de mon iPhone

Le 16 mai 2014 à 14:43, Marco Pivetta notifications@github.com a écrit :

@bakura10 totally for it, but only when it's EXTRA_LAZY


Reply to this email directly or view it on GitHub.

@guilhermeblanco
Copy link
Member

@bakura10 yes, I agree with current implementation.
That's why I was mentioning yesterday that default fetch mode should be extra lazy and not lazy.

@bakura10
Copy link
Member Author

I've run into an interesting "issue" while using this PR. It's not really a problem but couldn't find a way to solve it.

Basically, imagine you have a User => Tweets associations. If you match this association, you'll have a LazyCriteriaCollection. The PersistentCollection will automatically set a filter on the criteria here: https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/PersistentCollection.php#L880

However, if you set the same filter on the matching, it will have the same condition twice, and having SQL like "SELECT COUNT(*) FROM blabla WHERE blabla.id = :id AND blabla.id = :id"

There is no easy way to iterate through the where expression to have a check, though. It's not really a problem and I'm not sure if performance wise this is a problem, but taht would be interesting to have a look at this :).

@danizord
Copy link

@bakura10 can you open a new issue to discuss that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.