-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Throw exception to prevent infinite loop caused by third party code. #18678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw exception to prevent infinite loop caused by third party code. #18678
Conversation
Third party code combined with `trigger_recollect` can cause infinite loops in some scenarios. The fix needs to be applied to the third party code, but the Magento core could be more explicit with these kinds of errors.
Hi @convenient. Thank you for your contribution
For more details, please, review the Magento Contributor Assistant documentation |
@convenient please provide clear reproducible steps following best practices. Injecting session object like described is not a way to go, there is even a dedicated MEQP2 sniff to catch this.
Such problems should be found during testing prior to production deployment. |
Hey @orlangur I'll rebase any code style issues after our discussion is completed just to save potential wasted effort on my part.
Agree on this, it is not the way to go. And that is what the proposed exception is trying to help reinforce. Not all third party vendors run the sniffers and full array of suites available to ensure the code is up to scratch and following best practices. As it goes injecting the session object is a common carry over from M1 and I've seen it in a few M2 sites. I think it's important to consider that in a pure vanilla M2 instance the proposed changes to This PR is kind of like another layer of developer tooling for a very specific kind of bug.
The testing for these issues is often missed as it requires the
It is not possible to provide reproducible steps using best practices as this PR is here to help enforce best practices right at the last possible step, on the server itself. To demonstrate this issue I have to provide an example of some naive non-best practice code. All the CI tools help prevent these kinds of errors but if they're not run or missed we need something on the production box that will help in time of crisis. I believe the system should be robust all the way from the developers machine, through QA, to staging and then production. Steps to replicate using non-best-practice third party code
class AfterCollectTotals implements ObserverInterface
{
/**
* @var \Magento\Checkout\Model\Session
*/
private $session;
/**
* AfterCollectTotals constructor.
* @param \Magento\Checkout\Model\Session $messageManager
*/
public function __construct(\Magento\Checkout\Model\Session $messageManager)
{
$this->session = $messageManager;
}
/**
* @param Observer $observer
* @return void
*/
public function execute(\Magento\Framework\Event\Observer $observer)
{
/**
* Best practices say we should use the quote directly from the observer object, but sometimes we're a few levels
* deep into the collect totals loop, this could be part of a shipping module or something to do with addresses
* in which case you'd have to go $address->getQuote() etc, however this is not always done properly and
* the session object can be used by the developer.
*/
//$quote = $observer->getData('quote');
/**
* Get the session object from the quote
*/
$this->session->getQuote();
}
}
Below are two recordings, one with the proposed exception and one without. Demonstration without code changeThis hits the xdebug nesting limit, but would carry on "forever" in production Demonstration with code changeA clear stack trace is logged and the customer will see an error report. I think we should have the application handle this situation rather than rely on PHP fatal errors. |
This is already handled as intended: fatal error is saved to logs and it must be later investigated. What is an issue with code debugging in its current state? Event dispatched should be pretty obvious from stack trace and then it's not so hard to find a poorly written observer. The problem I see with proposed code changed is that there are to many places in Magento where similar fuse would be required if we go this way. |
Hey there, sorry for the delay in response it's been a busy few weeks.
A fatal error would be enough to alert people that something is happening, but not enough for them to begin debugging immediately IMO. Every time I've seen this it's been a pain to debug as the problem can present itself in different hard to understand ways depending on the server configuration.
Perhaps you know as configuration to enforce the full stack trace? Traces are usually provided for fatal errors as I tested with my same instance and configuration and got the following results. It seems that OOM errors have different behaviour. $foo = new stdClass();
$foo->bar();
There's evidence of people experiencing these collect totals and trigger_recollect loops many times over the years in a variety of Magento installations, the dev98 team described it as causing "server outage" which is consistent of things I've seen in the past and which would be mitigated by this PR.
I believe that should this PR be merged that it will help anyone debugging these kinds of issues greatly by stopping the loop on its first iteration (thus freeing the rest of the system resources and helping keep the rest of the site stable) and by providing a proper Magento stack trace and all associated error logging.
What other classes do you believe this would cause a precedent for? As i've highlighted above I've seen and heard a lot about infinite loops in the quotes model, but not much anywhere else. Let me know your thoughts, thanks. |
53810d5
to
8be25bf
Compare
Hey I'm not sure what to do to resolve these codacy issues.
|
@convenient those can be ignored. And thanks for the great insight by the way, I'm pretty convinced the problem is real now, just trying to understand whether this is a correct place in code for fix. |
@orlangur brilliant thank you. Let me know when you have any feedback 👍 |
Just to confirm that the problem is very real and very old coming from Magento 1 |
Hey @orlangur how are you? I was wondering if you'd had any thoughts on this issue? |
Hey there. Anything needed from me? Rebased or anything? |
* | ||
* @var bool | ||
*/ | ||
protected $_isLoading = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, avoid underscore in var name and make variable private since it used in the current class only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sidolov, thank you for the review.
|
@convenient thank you for contributing. Please accept Community Contributors team invitation here to gain extended permissions for this repository. |
@sidolov I'm not sure what #18678 (comment) is trying to invite me to, I thought I already had a lot of privs and its just redirecting me to https://github.com/magento You've approved this but should I address the static analysis fails? |
Hi @convenient, thank you for your contribution! |
Description
This issue is a little esoteric, so bear with me.
It is possible that Magento can get caught in an infinite loop when third party observers/plugins hook into a quote totalling event (or something triggered by a quote totalling event) and attempt to get the quote from the singleton. This is because the
trigger_recollect
flag can trigger quote totalling before the quote has been fully assigned to the checkout session object property.This "problem" existed in Magento 1 (https://magento.stackexchange.com/a/133312/17176) and I've debugged it a few times on different websites over the years. It's always manifested as an OOM error, or the CPU going crazy, or php timeout limits being hit. I recently saw this on a M2 site and want to do something about it in the core as once a stack trace has been identified it becomes a lot easier to identify the start of the loop and to correct it in the third party code.
I propose that a logic exception be thrown in this scenario as it would have the following benefits
In the attached tests I verify how the bug only occurs when
trigger_recollect=1
is set while the custom code is active.Prior to the code change my test displayed the issue like this, stopping at 200 levels deep due to xdebugs maximum nesting level. This would have continued forever.
With the code change the trace still shows a loop, but much shorter and without exhausting resources and much easier to debug.
Manual testing scenarios
I do not believe manual testing scenarios are relevant to this change as it requires custom third party code to trigger the loop.
Edit: Added some testing notes in this comment, but will require third party code to be added. #18678 (comment)
Contribution checklist