-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat][monitor] PIP-223: Add metrics for all rest endpoints. #21772
base: master
Are you sure you want to change the base?
Conversation
refers to: #18836 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #21772 +/- ##
============================================
+ Coverage 73.57% 73.59% +0.01%
+ Complexity 32624 32153 -471
============================================
Files 1877 1878 +1
Lines 139502 139603 +101
Branches 15299 15321 +22
============================================
+ Hits 102638 102737 +99
+ Misses 28908 28878 -30
- Partials 7956 7988 +32
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this.
Please note that we have already started implementing PIP-264, which is Open Telemetry. We have added the infrastructure needed to define metrics, and we are underway of adding the first metrics.
Once we finish the first PR, can we ping you to add those metrics using OTel? Once that PR we fill finally know the naming conventions to use.
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java
Outdated
Show resolved
Hide resolved
admin.namespaces().deleteNamespace("test/test"); | ||
admin.tenants().deleteTenant("test"); | ||
|
||
ByteArrayOutputStream output = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Save yourself this code and the intimate knowledge of how it is implemented:
Create a client like this against the local broker.
prometheusMetricsClient = new PrometheusMetricsClient("127.0.0.1", pulsar.getListenPortHTTP().get());
Get the Metrics:
Metrics metrics = prometheusMetricsClient.getMetrics();
and assert example:
Metric backlogAgeMetric =
metrics.findSingleMetricByNameAndLabels("pulsar_storage_backlog_age_seconds",
Pair.of("topic", topic1));
assertThat(backlogAgeMetric.tags).containsExactly(
entry("cluster", CLUSTER_NAME),
entry("namespace", namespace),
entry("topic", topic1));
assertThat((long) backlogAgeMetric.value).isCloseTo(expectedMessageAgeSeconds, within(2L));
Add the method you lack at Metrics
pulsar-broker/src/test/java/org/apache/pulsar/broker/stats/BrokerRestEndpointMetricsTest.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/PulsarService.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
private RestEndpointMetricsFilter(PulsarService pulsar) { | ||
PulsarBrokerOpenTelemetry telemetry = pulsar.getOpenTelemetry(); | ||
Meter meter = telemetry.getMeter(); | ||
latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at #22058 to understand how to do:
Instrument name, description, unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still relevant :)
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Outdated
Show resolved
Hide resolved
@dao-jun I'm trying to find a more elegant way to obtain the path template in Jersey DEV mailing list: https://www.eclipse.org/lists/jersey-dev/msg00370.html |
@dao-jun Ping me when you the PR is ready to continue review. |
@asafm I've updated the PR, PTAL |
pulsar-broker/src/main/java/org/apache/pulsar/broker/web/RestEndpointMetricsFilter.java
Show resolved
Hide resolved
@dragosvictor Could you please take a look that why my test keeps failing? |
private RestEndpointMetricsFilter(PulsarService pulsar) { | ||
PulsarBrokerOpenTelemetry telemetry = pulsar.getOpenTelemetry(); | ||
Meter meter = telemetry.getMeter(); | ||
latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still relevant :)
Meter meter = openTelemetry.getMeter(); | ||
latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency") | ||
.setDescription("Latency of REST endpoints in Pulsar broker") | ||
.setUnit("ms") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be s
, and measured in seconds: See https://opentelemetry.io/docs/specs/semconv/general/metrics/#instrument-units
.build(); | ||
} | ||
|
||
private static volatile RestEndpointMetricsFilter instance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be static? Can't you just created one instance of the filter and register it?
UriRoutingContext info = (UriRoutingContext) req.getUriInfo(); | ||
attrs = getRequestAttributes(info, statusCode); | ||
} catch (Throwable ex) { | ||
attrs = Attributes.of(PATH, "UNKNOWN", METHOD, req.getMethod(), CODE, (long) statusCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about that yet.
|
||
Object o = req.getProperty(REQUEST_START_TIME); | ||
if (o instanceof Long start) { | ||
long duration = System.currentTimeMillis() - start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read online now, and from from what I can gather, it's not recommended to use System.currentTimeMillis()
. It's a native call to obtain the wall clock time. NTP for example can sync the clock and change. Not 100% sure but leap second may also change it. See: https://develotters.com/posts/how-not-to-measure-elapsed-time/
I think it is safer to use System.nanoTime()
UriRoutingContext info = (UriRoutingContext) req.getUriInfo(); | ||
attrs = getRequestAttributes(info, statusCode); | ||
} catch (Throwable ex) { | ||
attrs = Attributes.of(PATH, "UNKNOWN", METHOD, req.getMethod(), CODE, (long) statusCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good practice. For example OOME will vanish like that.
I think if we don't have any specific exception in mind, no point in guarding this part.
if (CollectionUtils.isEmpty(templates)) { | ||
return Attributes.of(PATH, "UNKNOWN", METHOD, httpMethod, CODE, statusCode); | ||
} | ||
UriTemplate[] arr = templates.toArray(new UriTemplate[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you specifically need to convert this to an array? Can't you just iterate over the list using get(i)
?
UriTemplate[] arr = templates.toArray(new UriTemplate[0]); | ||
int idx = arr.length - 1; | ||
StringBuilder builder = new StringBuilder(); | ||
for (; idx >= 0; idx--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for (int idx = arr.length - 1; ...
?
admin.tenants().deleteTenant("test"); | ||
|
||
Collection<MetricData> metricDatas = pulsarTestContext.getOpenTelemetryMetricReader().collectAllMetrics(); | ||
log.info("Metrics size: {}", metricDatas.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed at this stage
|
||
Collection<MetricData> metricDatas = pulsarTestContext.getOpenTelemetryMetricReader().collectAllMetrics(); | ||
log.info("Metrics size: {}", metricDatas.size()); | ||
Optional<MetricData> optional = metricDatas.stream().peek(m -> log.info("metric name: {}", m.getName())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenTelemetry SDK has a beautiful AssertJ extension for MetricData
. @dragosvictor used it in his PR. It's here: https://github.com/open-telemetry/opentelemetry-java/blob/f83c020d4d2a11f16be8af739790718bb3413cba/sdk/testing/src/main/java/io/opentelemetry/sdk/testing/assertj/OpenTelemetryAssertions.java#L50
PIP: #18560
Motivation
#18560
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: dao-jun#6