-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count metric is overridden instead of added/summed #970
Comments
@leocavalcante can you please try adding cumulative temporality to your exporter? $exporter = new MetricExporter($transport, Temporality::CUMULATIVE); I was just scanning these docs: https://opentelemetry.io/docs/reference/specification/metrics/data-model/#example-use-cases |
@brettmc, thank you for the suggestion leocavalcante/otel-php-metrics@53b9422, but it didn't work. |
Please, note that using Swoole https://github.com/leocavalcante/otel-php-metrics/blob/main/swoole.php it works just fine. I really suspect that it's something about the stateless/shared-nothing FPM model, each request on its own process, has it's own Counter and despite the same name, each call to |
@brettmc I've also use Temporality::CUMULATIVE and didn't work. |
Every open-telemetry/opentelemetry-collector-contrib#4968 seems related. |
@leocavalcante - has this helped resolve your issue? |
Actually, it didn't, @bobstrecansky. PHP-FPM uses a shared-nothing model for each of the incoming requests (each requests knows/counts only itself). I would have to set a unique To use the SDK properly with PHP-FPM, its needs to somehow share the counter between processes/requests, like the Prometheus client does using Redis or APC. |
Hey, this is happening here too. I assume @leocavalcante is right on that. IMO, the proper way to do it is supporting APC/u to use as native approach as possible. Is there any possibility to implement something similar? |
I guess that a |
Hey, are you planning to implement it? :) |
Actually, no. I'm not sure if it is the way to go. Just guessing. |
Well, having an external system like Redis/memcached can add lag but enforce the values for the counters, and related stuff. Do you if APCu with FPM in front could work? Saying this because FPM generates several serving pools, so I have the impression that APCu will have a shared cache inside each pool, but not between the pools |
I meant I'm not sure if it is about creating a new |
What would be responsible for exporting metrics from storage (apcu/redis)? I can understand using shared storage, but not how the exporting side might work. Is there any prior art in other language implementations? |
Maybe the Prometheus client can serve as an implementation reference? |
Well, some time ago I had a similar problem and solved in a tricky way, I would say. Basically I transformed the in-memory object and stored into FS. Of course, this is something that increased a bit the fs read/write iops but the latency was not that bad Of course, this did solve some issues but not them all because of the same: FPM in front I would say when an instrumentation is created, it's needed to create and populate the connection to a, for example, Redis, and check if this connection is available on each request? HDYT this should be implemented? some ideas? |
Since we're getting close to release candidate, and no solution is in sight for this one, I'm thinking to document it as a known issue. |
@brettmc, an interesting information: I had the same problem using OTel's Collector Statsd receiver. It could be a problem at the Prometheus exporter not properly handling Delta metrics because I have two OTLP exporters (New Relic and Dynatrace) for the same metrics and it works for the Statsd's Delta metrics there. |
I sought feedback from Reiley Yang from the Technical Committee (who did our metrics TC review). He also thinks that this is not a bug, but expected behaviour when using shared-nothing PHP runtime. |
Sounds good to me. Not a bug, but a limitation when using Cumulative Temporality with PHP-FPM. |
@brettmc @leocavalcante Hello there, and sorry in advance for the delay. I agree with this is the expected behavior for shared-nothing environments. The problem is, most PHP production environments our there are using PHP-FPM in front of it. This is not a bug in the OTEL side, of course, but the reason why some other libraries like Prometheus SDK for PHP implemented the possibility to use external systems to store temporary the metrics before being collected by the collector. Honestly, I think solving this is something really really needed due to the way the industry uses PHP. May be not in the future if some alternatives like ReactPHP or Swoole get more popular, but ATM it is :) |
Just to clarify: PHP-FPM users are still able to use the SDK as-is, just keep de default Delta temporality instead of using Cumulative. |
I know, I know, but the problem are not metrics like gauge ones, where you set directly the value, but for example the counters, that are always set to 0 for each request In my opinion, for business metrics like whatever_successful_sum they are completely needed, as you need to add a value based on the application's previous one, and not based on 0 |
Actually, counters works just fine as well. We have tested exactly with a counter. |
Not working for me. So some questions:
I am asking because if you have shared-nothing envs, how does OTEL sdk store the information between the requests? AFAIK is not using ACPu neither In fact, I actually had to use Prometheus SDK for PHP (+ PHP-FPM) apps and OTEL SDK for the rest of them |
It doesn't need to store the information on the SDK side, it just sends a new value to be incremented in the counter. That's the key difference between Delta and Cumulative. DeltaCumulative
Which means, when using Delta you don't need to store the value of the counter since the beginning.
How are you trying to see the Metrics? Using OTel's Collector Prometheus exporter? |
Thank you for detailed explanation. Now I see I was right on the requirements :) I am using cumulative metrics for some business metrics that need to have the result in that way. The SDK, sending the metrics against Otel collector and from there to Grafana Mimir (for most applications, but for some others, the collector is pulling the /metrics endpoint) The problem I see with delta metrics is the fact that some business metrics like failed / successful_payments_sum need an absolute reference to compare success between both them. That's the reason to use cumulative there |
I agree. The reason for treating it as an enhancement rather than a bug is that the SDK is doing everything correctly, we think. A fix may end up being some new feature(s) added, and maybe not as part of the SDK itself. There is certainly a deficiency somewhere, but a fix will take some thought and planning and somebody to drive it. Having an external data store sounds complex. |
Hey guys, As @leocavalcante said, we are having success to send the metrics to Dynatrace, including counters. @brettmc I'm not an expert on collectors, but I think your idea is a good way to solve this problem. By the way, the stack prometheus + OpenTelemetry + php-fpm is, for me, the major case to implement that solution. It was also our old stack and we had the same problem as @achetronic. What do you think about this approach @brettmc? Do you think there is any other solution to work with this stack php-fpm + OpenTelemetry + prometheus? |
@achetronic, the problem isn't with Delta metrics. You can normally use Delta metrics for business metrics, for monotonic counters like successful payments, it will be added at your APM. The problem here is the Collector's exporter. Try adding an Tomorrow I will update my example repo, linked in the issue, to send the metrics to a free New Relic account to show you it working. |
@brettmc Understood and agree that should be driven separatedly :) but honestly, I don't think this can be implemented in the collector side, acting as a common collector, without causing a huge waste of resources on it. In fact, we had to disable the queues in some parts of the collector due to the huge amount of metrics we handle per cluster (after it, it works smoothly). May be you are right and we need some fresh ideas on that 😃 @eliseuborges Yes, I think the common stack is the only reason to implement it. I don't like that stack since PHP can embrace ReactPHP/Swoole, but you know, the industry is moving slow on that regard yet @leocavalcante I understand your point perfectly but honestly, our goal after years of being vendor locked is to be vendor agnostic as much as possible, so NewRelic/Dynatrace are not options. I understand you are suggesting this to proof your point about Delta metrics not being a problem, but IMHO we should think about fixing this in all the stacks used there, and (Prometheus/Thanos/Mimir) are the main one for this. Without covering the main ones, the project is basically useless :) I think you are right suggesting we can sum the Delta results and aggregate them using a time window to have more or less an absolute result over the year, but I think we should make it easier for the common developers to work with this kind of stuff to increase adoption with common stack, as most PHP developers out there understand first a cumulative counter on their common old stack more than the rest of the things, right? I will scrape your link 👁️ |
@achetronic there is it: leocavalcante/otel-php-metrics@c109686 I'm making this point to clarify that the problem isn't the SDK. Prometheus exporter can't handle Delta metrics. That is the real problem, not the SDK. It fails with Statsd too, for example. Instead of thinking there is a bug on the PHP SDK, the real bug is at the Prometheus exporter, that affects not only the PHP SDK, but whatever sends Delta metrics, like Statsd. We should be addressing to PRs like this ones: |
@leocavalcante hey, i'm not using that exporter :) In my case I'm using PrometheusRemoteWrite exporter and I understand your point because exact same behavior on my side (this was the reason to switch to Prometheus SDK). Taking apart the issue with the Cumulative metrics, that is a real issue IMO, and talking only about Delta ones, I think for them, the behavior is the expected as one request in this stack (PHP + PHP-FPM) will compute one increment for a counter, so delta is exactly 1 in that case, unless you set explicitly another value in, for example, a gauge metric, you know Another option is to force Prometheus to compute a result over the time, but it's not easy in the Prometheus side, as you need a quite complex Prometheus rule to compute a moving window for a metric, and this is not doable in a lot of cases, and moreover not desirable as you can have a lot of them. May be, as an idea, it could be solved as delta_to_cumulative or just a _cumulative exporter/transformer in the collector side? I think this can really solve the issue but can consume a huge amount of memory for that as it has to keep the metrics in memory? IMHO, this could solve the issue until the new feature for external-stored-cumulative is implemented in the SDK, and only until then, because once this is implemented, we could say that the SDK can behave the same in this stack as it does for a common scenario with standalone PHP or whatever another language. WDYT? Just for those recently coming into the discussion, or whatever, we have exactly the opposite in the collector |
Describe your environment
Steps to reproduce
https://github.com/leocavalcante/otel-php-metrics/blob/main/index.php
What is the expected behavior?
The
http_requests{}
metric being added while I hitcurl localhost:8080
What is the actual behavior?
It remains at the value of
1
Additional context
https://github.com/leocavalcante/otel-php-metrics
The text was updated successfully, but these errors were encountered: