Skip to content

Provide rewindable option to Aggregation and Query builders/queries #2116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 22, 2020

Conversation

Steveb-p
Copy link
Contributor

@Steveb-p Steveb-p commented Nov 20, 2019

Q A
Type improvement
BC Break no
Fixed issues #2113

Summary

Proof of concept for allowing user control over wrapping resulting Traversable/Iterator with CachingIterator - to prevent memory issues when iterating over large result sets.

Note: This PR is incomplete - still needs tests, relevant comment blocks and documentation adjustments.

  • Add tests
  • Add comments to code
  • Adjust documentation

@malarzm
Copy link
Member

malarzm commented Nov 23, 2019

So for the heads up, @alcaeus wants to evaluate one other approach/idea (but I'll let him do the explaining) and if it doesn't make sense, we'll go with this :)

@Steveb-p
Copy link
Contributor Author

Steveb-p commented Dec 12, 2019

@malarzm I've talked a little with @alcaeus over on slack about it, and he didn't seem sure about his approach.

As far as I understood it, it revolved around creating a WindowIterator that would cache a specific amount of results instead of the whole result set.

However, either way I believe some way of allowing control over what iterators are added to iterator chain will be probably needed, whether it will be rewindable method for both Builders or something else is up for discussion.

I've added commits with new BaseIterator class, since PrimingIterator requires Iterator instance as it's constructor argument and reducing it to iterable or Traversable might not do the trick - after all, there are plenty of instances where Iterator is specifically expected, not just any Traversable.
Depending on rewindable option either BaseIterator or CachingIterator is used to wrap Cursor.

I've also slightly changed the way CachingIterator works. I've removed the getIterator private method as it was only used in storeCurrentResult method - which in turn was called in constructor once and on each next call. Now instead the wrapping Generator stores results directly in $this->items just before yielding them (storeCurrentResult is still called in constructor).

private function wrapTraversable(Traversable $traversable) : Generator
{
    foreach ($traversable as $key => $value) {
        $this->items[$key] = $value;
        yield $key => $value;
        $this->iteratorAdvanced = true;
    }

    $this->iterator = null;
}

I've removed iteratorExhausted property - now Generator removes it's reference immediately just before final return, which also signifies iterator exhaustion ($this->iterator === null). This also means that Generator is freed earlier - immediately after exhaustion - in case PHP messes up it's reference counts.

I've wondered if those two iterators - BaseIterator and CachingIterator - should have a common parent class, but almost every method differs and I'm not sure it is worth it creating such a class for them.

There are still some code quality checks / comment updates that I need to address, but I'd like to know your opinion if this approach is a valid one.

See new commits: https://github.com/doctrine/mongodb-odm/pull/2116/files/6e26070f7ad7a89f33bc5b1042133e0886118c4d..a312708b991c7c36d8bdaa1b581a70e007dff620

Also, I believe I should target 2.0 branch, not master? I rebased assuming so.

Copy link
Member

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this logic! I've left some feedback in the code itself. Just a couple of points here:

  • I'm not sure if the refactoring in CachingIterator is necessary. If it isn't, I'd prefer leaving it out as long as it doesn't have a performance impact.
  • The code currently produces a lot of phpcs errors: https://travis-ci.org/doctrine/mongodb-odm/jobs/623925579#L2854. If you've installed dependencies locally, you can run vendor/bin/phpcbf to fix most of them. Then, run vendor/bin/phpcs to see what errors couldn't be fixed and fix them manually.
  • PHPStan also complains about your code: https://travis-ci.org/doctrine/mongodb-odm/jobs/623925578#L1426. Reverting the refactoring in CachingIterator will fix some of them, while applying the getIterator() method to BaseIterator and updating the $iterator type hint as suggested below will fix the rest of them.
  • This PR currently contains commits from another branch. I'll merge up through the branches soon, which should take care of that. We can worry about rebasing and squashing commits once the implementation is solid.

public function __construct(Traversable $iterator)
{
$this->iterator = $this->wrapTraversable($iterator);
$this->storeCurrentItem();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this call is superfluous: wrapTraversable stores the current key and value in $this->key and $this->value in the foreach loop, so I'm wondering why we still need this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required to cause generator to reach first yield. Otherwise Cursor is not starting it's execution, which differs from what CachingIterator does. When creating this I worked under the assumption that this should not be changed.

Should I change the method name to something else, and if yes, what would you prefer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the method completely and only left $this->iterator->key() in __construct which - as far as I understand it, I may be wrong - starts up the generator causes Cursor to query and execute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the initial foreach already initialise the cursor and start iterating?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<?php

function wrapTraversable(Traversable $traversable) {
    echo("ITERATOR EXECUTING\n");
    foreach ($traversable as $key => $value) {
        yield $key => $value;
    }
};

$generator = wrapTraversable(new ArrayIterator([1, 2, 3]));

echo "BEFORE KEY\n";
$generator->key();
echo "AFTER KEY\n";

foreach ($generator as $key => $value) {
    echo $value;
}

Output:

/usr/bin/php /home/steveb/PhpstormProjects/adpanel/adpanel/test.php
BEFORE KEY
ITERATOR EXECUTING
AFTER KEY
123

Therefore, because $this->iterator is an instance of a Generator, it does not start executing (even until the first yield statement) until any call to the usual Iterator methods.

CachingIterator is - as far as I understand it - used to immediately start the Cursor and this call does exactly that, similarly to how storeCurrentItem was doing.

Steveb-p pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Dec 31, 2019
Steveb-p pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Dec 31, 2019
Steveb-p pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Dec 31, 2019
Steveb-p pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Dec 31, 2019
@Steveb-p Steveb-p requested a review from alcaeus December 31, 2019 11:00
@alcaeus
Copy link
Member

alcaeus commented Feb 4, 2020

Sorry for the long delay. Logic looks good now. Could you please add tests for the builder/query functionality to make sure no The QueryTest and BuilderTest classes (for Aggregation and Query) serve as examples. After that, this is ready for a merge into 2.1 :)

@Steveb-p
Copy link
Contributor Author

@alcaeus I'll try to wrap this up tonight/tomorrow.

@alcaeus
Copy link
Member

alcaeus commented Feb 18, 2020

No worries, caught up in things as well, so I can only finish it up next week anyway.

@Steveb-p
Copy link
Contributor Author

Steveb-p commented Feb 19, 2020

I've added tests in 9f347c6. If I misunderstood their purpose - being too generic or too narrow - please correct me.

They should not pass since I've deliberately marked them as expecting an exception to be thrown to show a little quirk/issue that those tests uncovered: empty resultsets will incorrectly not mark iterator as exhausted. This is due to the fact that foreach loop in wrapping Generator is never entered. I've tried setting iteratorAdvanced property to true after the foreach but since we expect the iterator to be started immediately this causes it to become unusable. I'm wondering if there is a graceful solution for this. WDYT?

EDIT: We can also simply ignore this particular behavior with empty results - they would appear as if they are rewindable - but I'd prefer to ask for your opinion beforehand.

EDIT2: I've noticed that expectException methods replaced annotations on master, I'll update accordingly.

@alcaeus alcaeus self-assigned this Apr 22, 2020
@alcaeus alcaeus added this to the 2.1.0 milestone Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
alcaeus pushed a commit to Steveb-p/mongodb-odm that referenced this pull request Apr 22, 2020
@alcaeus
Copy link
Member

alcaeus commented Apr 22, 2020

@Steveb-p once again, sorry for the long delay here, but I was not able to do any significant work over the past few weeks due to the current situation.

I updated the tests so they pass, and also added paragraphs to the query builder and aggregation builder documentation explaining this feature. Once I've got the other branches stable and merge up, this is good to merge.

Thanks for highlighting this issue, fixing it, and bringing along more patience a maintainer could ask for! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants