-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MP Metrics 5.0 support for 4.x #7139
Conversation
bc72cfb
to
05447e9
Compare
There seems to be a problem with the calls to registry. Calls to Problem is in So the @Override
public Iterable<String> scopes() {
Set<String> result = new HashSet<>(registries.keySet());
// always add base and vendor, as these are known to exist (even if not yet initialized)
result.add(Registry.BASE_SCOPE);
result.add(Registry.VENDOR_SCOPE);
return result;
} |
Exactly. I also found this independently. Pushed a fix to the PR branch. |
We definitely need to do this for base (but only if it is enabled; remember metrics can be disabled by scope). But you think also for vendor? Base is the only registry lazily instantiated upon first read. |
Still investigating, but the actual underlying cause might be different. With the change to the It seems that maybe the metrics feature post-setup code is not invoked when expected. That code should be fetching the base metrics registry to trigger its creation and also set up the KPI metrics...and neither seems to be happening and in the debugger I don't see the metrics feature |
BTW, the latest pushes resolve the problem Tomas and I discovered with the base metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it relates to changes in integrations/oci/metrics
, it LGTM, hence giving approval from that perspective.
…rs; remove scope tag from JSON output
…less sensitive to ordering of TEST and HELP lines in Prometheus output
…es (instead of empty output)
…ially no-ops during normal usage)
…d created yet, when Registry is asked for the list of scopes; also make sure metrics is properly declared as a feature
…shutdown, now that the clean-up is in the RegistryFactory stop rather than start method
207ccfc
to
e138b14
Compare
private static String valueMatcher(String statName) { | ||
// application_timerForGets_mean_seconds 0.010275403147594316 # {trace_id="cfd13196e6a9fb0c"} 0.002189822 1617799841.963000 | ||
return "application_" + GreetService.TIMER_FOR_GETS | ||
+ "_" + statName + "_seconds [\\d\\.]+ # \\{trace_id=\"[^\"]+\"\\} [\\d\\.]+ [\\d\\.]+"; | ||
// TODO Following is the original, exemplar-matching pattern. Suppressed temporarily while we migrate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you place the issue number here as part of the follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update in this PR if there are other, functional, changes as well.
@@ -34,6 +34,7 @@ class MicroProfileMetricsTrackerFactory implements MetricsTrackerFactory { | |||
this.registry = null; | |||
} | |||
|
|||
// TODO change to RegistryScope once MP makes it a qualifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracking issue inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update in this PR if there are other, functional, changes as well.
private Instance() { | ||
} | ||
|
||
private static final LazyValue<MetricsProgrammaticSettings> INSTANCE = LazyValue.create(() -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we using the @Contract
and pico style approach here? Holistically, I think we need to stop using this pattern - @tomas-langer should comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely needs to happen. Much was changing in a short time as this was coming together and I chose to defer those changes until we had MP metrics operating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the correct approach for now. We may want to use contract based lookup, but only when running within injection service registry. It still must work outside of it
* @throws java.lang.IllegalArgumentException if the implementation cannot handle the requested media type | ||
* @throws java.lang.UnsupportedOperationException if the implementation cannot expose its metrics | ||
*/ | ||
Optional<?> scrape(MediaType mediaType, Iterable<String> scopeSelection, Iterable<String> meterNameSelection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning Optional<?>
seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return null
instead if the provided selections result in no output, but it's against our guidelines to use null
as part of our APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not the optional i am commenting on - it is the use of the wildcard generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying.
If you look at the two formatters that can be called, the Prometheus one returns an Optional<String>
and the JSON one returns an Optional<JsonObject>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case why not return a Builder with two Optional<> attributes? I think that would be better than using generics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is, the value (if present inside the Optional
) is sent directly as the entity in the response. The code in MetricsFeature
has no need to know the specific type of the returned value. The media support handles whatever the actual type is in sending the response. It's very flexible if we ever need to add another formatter that produces another type.
Adding another builder would (IMO) unnecessarily complicate things now, and if we were to add other format types in the future then we'd have to update the builder as well. As is, a hypothetical new formatter would just return its type and we'd be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw Optional<Object>
would work the same and it is less unusual on the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to follow-up tasks issue #7193
void addCounterWithTag() { | ||
Counter counter = reg.counter(COMMON_COUNTER_NAME, new Tag("myTag", "a")); | ||
assertThat("New counter value", counter.getCount(), is(0L)); | ||
counter.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the point of this increment call if its not getting checked later on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @BeforeEach
exercises the metric deletion feature. The increment of the counters in each test verifies that the counter was, in fact, deleted correctly by that code. If the deletion does not work correctly, then the counter value in the other test could be non-zero as a result of the increment at the end of the first test.
* @param <T> type of the origin | ||
*/ | ||
static <T> HelidonFunctionalCounter<T> create(MeterRegistry meterRegistry, | ||
String scope, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change in this PR if there are also other, functional, changes needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of MIcrometer as part of Metrics API was unexpected for me, and I would like to understand this.
I thought the MP was kept independent yet aligned with Micrometer, but now we have a hard dependency on it.
@@ -50,6 +50,14 @@ | |||
<groupId>org.eclipse.microprofile.metrics</groupId> | |||
<artifactId>microprofile-metrics-api</artifactId> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is Micrometer an explicit dependency here? I thought the APIs should be MP and not Micrometere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not ideal, but we plan for Helidon to introduce its own neutral metrics API and, once that is in place, we will change our MP metrics implementation to depend on that rather than directly on Micrometer.
But due to scheduling complications earlier in the spring and given our goal of having MP metrics support in M1 we decided to go ahead with a first MP implementation without waiting for the neutral API, knowing we would change the MP implementation once the neutral API is in place.
The neutral API will likely be very close to the Micrometer API, partly for the convenience of developers and partly because of Micrometer's maturity in this space. In that case, the changes we will need to make in our MP implementation, although not zero, will be relatively small. Much of the code already uses a MetricFactory
SPI abstraction to help minimize the changes required later.
We would definitely have preferred to have the neutral API in place first, but I just couldn't make that happen in time for MP metrics in M1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to fix this for M2. It doesn't look very nice. I should be able to help Tim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should implement our neutral API and adapt SE and MP to it soon.
private Instance() { | ||
} | ||
|
||
private static final LazyValue<MetricsProgrammaticSettings> INSTANCE = LazyValue.create(() -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the correct approach for now. We may want to use contract based lookup, but only when running within injection service registry. It still must work outside of it
* customization. | ||
* </p> | ||
*/ | ||
public interface MetricsProgrammaticSettings { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a follow up for naming here - we use Config
as the suffix for configuring components in other modules, I think it should be used for this class as well
* @throws java.lang.IllegalArgumentException if the implementation cannot handle the requested media type | ||
* @throws java.lang.UnsupportedOperationException if the implementation cannot expose its metrics | ||
*/ | ||
Optional<?> scrape(MediaType mediaType, Iterable<String> scopeSelection, Iterable<String> meterNameSelection); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw Optional<Object>
would work the same and it is less unusual on the API
* | ||
* @param <T> type of the object providing the value | ||
*/ | ||
class HelidonFunctionalCounter<T> extends MetricImpl implements Counter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand it correctly that this is basically a Gauge? Why do we still have it? Or is it represented differently by the format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, this allows a Counter
-style "wrapper" around an externally-updated value instead of a Gauge
-style wrapper.
Earlier releases of MP metrics allowed developers to create their own implementations of metrics and register them. Helidon used this to register custom counters, for example.
MP Metrics 5.0 no longer permits this, but the ability to have a Counter
-like wrapper is useful (and present in Micrometer and OTel metrics) so will likely be in the Helidon neutral API as well.
|
||
private HelidonGauge(String registryType, Metadata metadata, Gauge<T> metric) { | ||
super(registryType, metadata); | ||
private final io.micrometer.core.instrument.Gauge delegate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we should have support for Micrometer and possibly other metrics implementation, but not hardcoded into our impl, but through provider modules.
Why is there a hard dependency on Micrometer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See earlier note about the neutral API and timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not realize from the description that we must use Micrometer to be able to generate prometheus output (as we used to generate it ourself).
I have approved the PR, we need to fix this eventually when we switch to Helidon Metrics API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that our neutral API will layer on one or more actual implementations (Micrometer, OTel metrics), we should delegate to the underlying implementation to publish, whether Prometheus or other (push or pull) models, rather than duplicate that functionality in our own code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are things to follow up on, let's merge so we have MP 6 complete
Resolves #5816
Overview
Note: Look for 🚧 User action flags below for what developers need to do to adapt to the changes. The MP spec section on migration hints has information also.
The latest MP Metrics release is 5.0.1.
This PR contains all the code changes that I know of to implement MP Metrics 5.0. As noted in separate meetings, the TCK is not yet running because the way we set up the Arquillian environment leads to many test failures. That remains a work in progress.
To detect other problems, I have commented out the invocation of the metrics TCK in this PR.
What this PR does not include:
nima
directoriesMP Metrics 5.0
There were breaking changes in the MP metrics libraries--to help simplify the API and bring MP metrics in closer alignment with Micrometer and OpenTelemetry metrics. Notable changes:
metric types removed
🚧 User action - use an alternative metric type and annotation
ConcurrentGauge
@ConcurrentGauge
Gauge
@Gauge
Meter
@Metered
Counter
@Counted
- back-ends can track the changes in the counter value over time and create time series analysis from that (meter used to maintain historical data itself in the server)SimpleTimer
@SimplyTimed
Timer
@Timed
metadata removals
🚧 User action - Remove code that sets the type or display name in the metadata programmatically and remove settings in the
@Metric
annotation.Other
The notion of a registry "type" with fixed values of base, vendor, and application has given way to the idea of a "scope" instead. There are three built-in scopes--base, vendor, and application--but users can create their own scopes.
MetricRegistry.Type
was an enum covering base, vendor, and application. It's gone because users can create their own so an enum doesn't really work.@RegistryType
is deprecated, intended to be replaced by@RegistryScope
. Unfortunately,@RegistryScope
is not (yet) marked as a CDI qualifier (that'll change in an upcoming release of MP Metrics), so for now our CDI code still has producers for the three registry types using@RegistryType
as well as a producer for injection sites that use@RegistryScope
rather than@RegistryType
.🚧 User action
MetricRegistry.Type
to aString
with the corresponding name. TheMetricRegistry
interface declaresString
constants for the predefined scopes.@RegistryType
(deprecated but not yet removed) to@RegistryScope
.All metrics that share the same name must:
🚧 User action - Make sure all metrics that share the same name use the same tag names.
The metric registry API no longer permits you to create your own instance of one of the supported metric types (such as your own
Counter
) and register it. This is because Micrometer and OpenTel are totally in control of manufacturing the metric instances.Some of our code (integrations, for example) used to do this in order to wrap other technology's measurements with custom implementations of the MP metrics (typically a counter). For the most part where I had to change our code to adjust, I changed such places to use gauges instead. In line with what Micrometer supports, The
Registry
interface usable from SE apps includes a way to register a read-onlyCounter
by passing aToDoubleFunction
and an object to which to apply that function. But this is not visible through the MPMetricRegistry
interface. In some places I used that approach.🚧 User action - Use a
Gauge
metric type to expose a value that is updated outside of the metrics API.Change in
Gauge
toGauge<? extends Number>
- TheGauge
metric was always intended to expose numeric data. The updated API now enforces this.🚧 User action - Declare each of your
Gauge
metrics so it reflects the specific numeric type the gauge exposes.Metric types remaining from earlier releases are
Counter
,Gauge<? extends Number>
,Histogram
, andTimer
.Why do so many files have to change?
Our SE metrics implementation has always dependend--and, for the moment, still does depend--on the MP metrics types. We plan for that to change by introducing a neutral Helidon metrics API which will eventually have implementations which delegate to Micrometer and OpenTelemetry metrics.
Then our MP metrics implementation will invoke the neutral Helidon API, as will customers who develop SE apps.
People will be able to choose whether to use Micrometer or OTel underneat for either SE or MP apps.
But we have not yet defined or implemented the neutral API yet. (I have thought a lot about what that API should look like but the code is not written yet.)
So, the MP metrics changes rippled through our SE metrics API and implementation, and into other code that uses metrics, and into examples, and into tests.
Other notes
Although this PR does not introduce the neutral Helidon metrics API, there are some beginning steps in that direction.
For example, the new
MetricFactory
SPI interface encapsulates the details of creating new metrics. There are two implementations so far: one that delegates to Micrometer and one that provides no-op implementations of the various metrics.Other changes in behavior
Prometheus output
In addition to the API changes, the Prometheus output now is slightly different. Each line showing a metric value (that is, not the
# TYPE
or# HELP
lines) has an additional tag to identify the scope where that metric was registered. In MP, the tag name ismp_scope
per the spec; in SE it (for the moment) ish_scope
.This is because our code now stores metrics in the Micrometer meter registry...a single meter registry for all scopes we and our developers might create. That's the way Micrometer works.
The Micrometer world does not use different meter registries for scoping but rather for dealing with different types of backends/consumers of metrics output. Typical applications use the Micrometer global registry for adding, removing, and querying meters.
That global registry is a composite registry, meaning you can add technology-specific meter registries to it.
There is a Prometheus Micrometer meter registry Helidon creates and adds (if absent) to the global registry.
Applications create and add other types of meter registries to the global one to support other back-ends.
Further, the Prometheus Micrometer meter registry knows how to format its output so we will be able to remove our own code that does that.
JSON output
It's still there in Helidon, but the MP metrics spec no longer requires it. At this moment, I have not added back the JSON
OPTIONS
support for metadata.I might do that if I have time.
We might want to remove JSON support at some point--perhaps deprecate it in 4.0--but there was some Helidon code that used it for testing and there might be user code out there that relies on it.
Users will have plenty to adapt to in 4.0 already, so it might be worth keeping JSON format output for now to ease that migration.
I already changed some of the JSON-dependent testing code to use Prometheus output instead; it is possible to do that everywhere if we need to.
Because there will be a cost to us in keeping it around--even if deprecated--until 5.0, we might want to remove it now.
To my knowledge, MP metrics never marked the JSON output as deprecated during the MP metrics 4.x lifespan, for what that's worth.
Current state
Work done
Work left to do for full MP metrics support in M1