Magento discards most New Relic cron transactions

# Summary

Magento is discarding all new relic transaction logging/runtime information/errors for every cron job after the first one executed per process.

For example if you had three pending jobs `job1`, `job2`, and `job3` you would only have a transaction for `job1` and no info at all in new relic for `job2` and `job3`.

This is a critical issue IMO as it reduces the effectiveness of our ability to support the Magento instance. I assumed I was getting all relevant data in new relic but I am actually missing performance analytics as well as possible errors that are not being tracked in new relic. This affected me debugging downtime issues affecting a website.

# Cause

This issue was introduced in 2.4.0 by 5819ded0ed228e0de105000d92da26531c07785a

In that commit there were attempts to 
1. stop logging the `cron:run` command as a command (the `isCommandSkipped` logic) 
2. set the transaction name for each pending job processed in the queue.

The current implementation works a bit like so, starting at `Magento\Cron\Observer\ProcessCronQueueObserver::execute` 

https://github.com/magento/magento2/blob/4c36116dcf878e127059d9be9566a119783583f2/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php#L273

calls

https://github.com/magento/magento2/blob/4c36116dcf878e127059d9be9566a119783583f2/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php#L825

calls

https://github.com/magento/magento2/blob/4c36116dcf878e127059d9be9566a119783583f2/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php#L858

This is the part of the process that actually executes the job, it does so like this

https://github.com/magento/magento2/blob/4c36116dcf878e127059d9be9566a119783583f2/app/code/Magento/Cron/Observer/ProcessCronQueueObserver.php#L350-L377

You can see that each execution of a job is wrapped with `startProfiling` and `stopProfiling`. 

These functions have plugins in them to set `newrelic_name_transaction` and `newrelic_end_transaction`

https://github.com/magento/magento2/blob/4c36116dcf878e127059d9be9566a119783583f2/app/code/Magento/NewRelicReporting/Plugin/StatPlugin.php#L62-L73

https://github.com/magento/magento2/blob/4c36116dcf878e127059d9be9566a119783583f2/app/code/Magento/NewRelicReporting/Plugin/StatPlugin.php#L84-L93

The issue is that we call `newrelic_end_transaction` but nowhere in the codebase is there another call to `newrelic_start_transaction`, this is required to track subsequent jobs.

> For task queue managers, you can use newrelic_end_transaction() and newrelic_start_transaction() together to manually separate transactions
- https://docs.newrelic.com/docs/agents/php-agent/php-agent-api/newrelic_start_transaction/#stop-and-start

This is the major bug, without this call to restart the transaction all subsequent data is not tracked and reported to New Relic.

There is a second quality of life issue. Due to the location where we set the initial transaction name we end up with two traces per cron execution, one of them is flagged as `unknown` which encompasses everything up until we call `newrelic_name_transaction` for the first time, so basically when you see `unknown` on the CLI New Relic data you can think of it as "the time it takes to boot up `cron:run` and get to the point where you run the first actual job".

In effect what we have is the following 
```php
<?php
sleep(10); // cron:run loader
foreach ($i=1; $i<=3; $i++) {
    newrelic_name_transaction("job$i");
    sleep(10);
    newrelic_end_transaction();
}
```

Results
- `unknown` transaction containing the booting of `cron:run`
- `job1` transaction only

When we need something like 

```php
<?php
newrelic_name_transaction("cron:run loader");
sleep(10); // cron:run loader
newrelic_end_transaction();
newrelic_start_transaction(ini_get("newrelic.appname"));
foreach ($i=1; $i<=3; $i++) {
    newrelic_name_transaction("job$i");
    sleep(10);
    newrelic_end_transaction();
    newrelic_start_transaction(ini_get("newrelic.appname"));
}
```

Results
- `cron:run loader` transaction containing the booting of `cron:run`
- `job1` transaction
- `job2` transaction
- `job3` transaction

# Hotfix

For anyone looking for an urgent patch to get their crons all properly logged in New Relic the following seems to be working for me. This ensures we always have a transaction running until this process terminates.

```diff
--- Model/NewRelicWrapper.php	2021-01-07 21:56:58.000000000 +0000
+++ Model/NewRelicWrapper.php	2021-09-17 09:11:31.000000000 +0100
@@ -79,6 +79,7 @@
     {
         if ($this->isExtensionInstalled()) {
             newrelic_end_transaction($ignore);
+            newrelic_start_transaction(ini_get("newrelic.appname"));
         }
     }
```

# Proper solution

I have some rough ideas but I'll leave this for someone in Magento to pick up. I cant see any trivial way of doing what I propose as we'd need to couple the Cron module to New Relic reporting module a bit tighter.

Firstly we can give the `cron:run` its old transaction name back by removing this.
https://github.com/magento/magento2/blob/5819ded0ed228e0de105000d92da26531c07785a/app/code/Magento/NewRelicReporting/etc/di.xml#L49-L55

Then before we call `startProfiling` in `ProcessCronQueueObserver.php` we need to end the current new relic transaction, then start a new one.  

It would also feel weird for `NewRelicWrapper::endTransaction` to have the responsibility of `newrelic_start_transaction`, but I wonder whether adding a new `NewRelicWrapper::startTransaction` would break backwards compatibility if its a new public method 🤔 But the breakage here is so severe that it makes the cron data for new relic practically without use so something needs to be addressed here.

---

Please provide [Severity](https://devdocs.magento.com/guides/v2.4/contributor-guide/contributing.html#backlog) assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.

- [ ] Severity: **S0** _- Affects critical data or functionality and leaves users without workaround._
- [x] Severity: **S1** _- Affects critical data or functionality and forces users to employ a workaround._
- [ ] Severity: **S2** _- Affects non-critical data or functionality and forces users to employ a workaround._
- [ ] Severity: **S3** _- Affects non-critical data or functionality and does not force users to employ a workaround._
- [ ] Severity: **S4** _- Affects aesthetics, professional look and feel, “quality” or “usability”._


	$this->startProfiling($jobCode);
	$this->eventManager->dispatch('cron_job_run', ['job_name' => 'cron/' . $groupId . '/' . $jobCode]);

	try {
	$this->logger->info(sprintf('Cron Job %s is run', $jobCode));
	//phpcs:ignore Magento2.Functions.DiscouragedFunction
	call_user_func_array($callback, [$schedule]);
	} catch (\Throwable $e) {
	$schedule->setStatus(Schedule::STATUS_ERROR);
	$this->logger->error(
	sprintf(
	'Cron Job %s has an error: %s. Statistics: %s',
	$jobCode,
	$e->getMessage(),
	$this->getProfilingStat($jobCode)
	)
	);
	if (!$e instanceof \Exception) {
	$e = new \RuntimeException(
	'Error when running a cron job',
	0,
	$e
	);
	}
	throw $e;
	} finally {
	$this->stopProfiling($jobCode);
	}

	public function beforeStart(Stat $schedule, ...$args): array
	{
	$timerName = current($args);

	if ($this->isCronJob($timerName)) {
	$this->newRelicWrapper->setTransactionName(
	sprintf('Cron %s', $timerName)
	);
	}

	return $args;
	}

	public function beforeStop(Stat $schedule, ...$args): array
	{
	$timerName = current($args);

	if ($this->isCronJob($timerName)) {
	$this->newRelicWrapper->endTransaction();
	}

	return $args;
	}

	<type name="Magento\NewRelicReporting\Plugin\CommandPlugin">
	<arguments>
	<argument name="skipCommands" xsi:type="array">
	<item xsi:type="boolean" name="cron:run">true</item>
	</argument>
	</arguments>
	</type>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magento discards most New Relic cron transactions #34108

Summary

Cause

Hotfix

Proper solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Magento discards most New Relic cron transactions #34108

Description

Summary

Cause

Hotfix

Proper solution

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions