Description
Summary
Magento is discarding all new relic transaction logging/runtime information/errors for every cron job after the first one executed per process.
For example if you had three pending jobs job1
, job2
, and job3
you would only have a transaction for job1
and no info at all in new relic for job2
and job3
.
This is a critical issue IMO as it reduces the effectiveness of our ability to support the Magento instance. I assumed I was getting all relevant data in new relic but I am actually missing performance analytics as well as possible errors that are not being tracked in new relic. This affected me debugging downtime issues affecting a website.
Cause
This issue was introduced in 2.4.0 by 5819ded
In that commit there were attempts to
- stop logging the
cron:run
command as a command (theisCommandSkipped
logic) - set the transaction name for each pending job processed in the queue.
The current implementation works a bit like so, starting at Magento\Cron\Observer\ProcessCronQueueObserver::execute
calls
calls
This is the part of the process that actually executes the job, it does so like this
magento2/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php
Lines 350 to 377 in 4c36116
You can see that each execution of a job is wrapped with startProfiling
and stopProfiling
.
These functions have plugins in them to set newrelic_name_transaction
and newrelic_end_transaction
magento2/app/code/Magento/NewRelicReporting/Plugin/StatPlugin.php
Lines 62 to 73 in 4c36116
magento2/app/code/Magento/NewRelicReporting/Plugin/StatPlugin.php
Lines 84 to 93 in 4c36116
The issue is that we call newrelic_end_transaction
but nowhere in the codebase is there another call to newrelic_start_transaction
, this is required to track subsequent jobs.
For task queue managers, you can use newrelic_end_transaction() and newrelic_start_transaction() together to manually separate transactions
This is the major bug, without this call to restart the transaction all subsequent data is not tracked and reported to New Relic.
There is a second quality of life issue. Due to the location where we set the initial transaction name we end up with two traces per cron execution, one of them is flagged as unknown
which encompasses everything up until we call newrelic_name_transaction
for the first time, so basically when you see unknown
on the CLI New Relic data you can think of it as "the time it takes to boot up cron:run
and get to the point where you run the first actual job".
In effect what we have is the following
<?php
sleep(10); // cron:run loader
foreach ($i=1; $i<=3; $i++) {
newrelic_name_transaction("job$i");
sleep(10);
newrelic_end_transaction();
}
Results
unknown
transaction containing the booting ofcron:run
job1
transaction only
When we need something like
<?php
newrelic_name_transaction("cron:run loader");
sleep(10); // cron:run loader
newrelic_end_transaction();
newrelic_start_transaction(ini_get("newrelic.appname"));
foreach ($i=1; $i<=3; $i++) {
newrelic_name_transaction("job$i");
sleep(10);
newrelic_end_transaction();
newrelic_start_transaction(ini_get("newrelic.appname"));
}
Results
cron:run loader
transaction containing the booting ofcron:run
job1
transactionjob2
transactionjob3
transaction
Hotfix
For anyone looking for an urgent patch to get their crons all properly logged in New Relic the following seems to be working for me. This ensures we always have a transaction running until this process terminates.
--- Model/NewRelicWrapper.php 2021-01-07 21:56:58.000000000 +0000
+++ Model/NewRelicWrapper.php 2021-09-17 09:11:31.000000000 +0100
@@ -79,6 +79,7 @@
{
if ($this->isExtensionInstalled()) {
newrelic_end_transaction($ignore);
+ newrelic_start_transaction(ini_get("newrelic.appname"));
}
}
Proper solution
I have some rough ideas but I'll leave this for someone in Magento to pick up. I cant see any trivial way of doing what I propose as we'd need to couple the Cron module to New Relic reporting module a bit tighter.
Firstly we can give the cron:run
its old transaction name back by removing this.
magento2/app/code/Magento/NewRelicReporting/etc/di.xml
Lines 49 to 55 in 5819ded
Then before we call startProfiling
in ProcessCronQueueObserver.php
we need to end the current new relic transaction, then start a new one.
It would also feel weird for NewRelicWrapper::endTransaction
to have the responsibility of newrelic_start_transaction
, but I wonder whether adding a new NewRelicWrapper::startTransaction
would break backwards compatibility if its a new public method 🤔 But the breakage here is so severe that it makes the cron data for new relic practically without use so something needs to be addressed here.
Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.
- Severity: S0 - Affects critical data or functionality and leaves users without workaround.
- Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
- Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
- Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
- Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.
Metadata
Metadata
Labels
Type
Projects
Status
Done
Activity