Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat][monitor] PIP-223: Add metrics for all rest endpoints. #21772

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

dao-jun
Copy link
Member

@dao-jun dao-jun commented Dec 20, 2023

PIP: #18560

Motivation

#18560

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: dao-jun#6

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Dec 20, 2023
@dao-jun dao-jun self-assigned this Dec 20, 2023
@dao-jun
Copy link
Member Author

dao-jun commented Dec 20, 2023

refers to: #18836

@dao-jun dao-jun changed the title [feat][monitoring] PIP-223: Add metrics for all rest endpoints. [feat][monitor] PIP-223: Add metrics for all rest endpoints. Dec 20, 2023
@Technoboy- Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023
@dao-jun dao-jun closed this Feb 29, 2024
@dao-jun dao-jun reopened this Feb 29, 2024
@github-actions github-actions bot removed the type/PIP label Feb 29, 2024
@dao-jun dao-jun requested review from lhotari and asafm February 29, 2024 13:24
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 86.79245% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 73.59%. Comparing base (bbc6224) to head (e139404).
Report is 19 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #21772      +/-   ##
============================================
+ Coverage     73.57%   73.59%   +0.01%     
+ Complexity    32624    32153     -471     
============================================
  Files          1877     1878       +1     
  Lines        139502   139603     +101     
  Branches      15299    15321      +22     
============================================
+ Hits         102638   102737      +99     
+ Misses        28908    28878      -30     
- Partials       7956     7988      +32     
Flag Coverage Δ
inttests 26.43% <3.77%> (+1.85%) ⬆️
systests 24.31% <3.77%> (-0.02%) ⬇️
unittests 72.84% <86.79%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...org/apache/pulsar/broker/ServiceConfiguration.java 99.39% <100.00%> (+<0.01%) ⬆️
.../java/org/apache/pulsar/broker/web/WebService.java 94.21% <100.00%> (+0.06%) ⬆️
...e/pulsar/broker/web/RestEndpointMetricsFilter.java 85.71% <85.71%> (ø)

... and 74 files with indirect coverage changes

Copy link
Contributor

@asafm asafm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this.
Please note that we have already started implementing PIP-264, which is Open Telemetry. We have added the infrastructure needed to define metrics, and we are underway of adding the first metrics.
Once we finish the first PR, can we ping you to add those metrics using OTel? Once that PR we fill finally know the naming conventions to use.

admin.namespaces().deleteNamespace("test/test");
admin.tenants().deleteTenant("test");

ByteArrayOutputStream output = new ByteArrayOutputStream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save yourself this code and the intimate knowledge of how it is implemented:
Create a client like this against the local broker.

            prometheusMetricsClient = new PrometheusMetricsClient("127.0.0.1", pulsar.getListenPortHTTP().get());

Get the Metrics:

            Metrics metrics = prometheusMetricsClient.getMetrics();

and assert example:

            Metric backlogAgeMetric =
                    metrics.findSingleMetricByNameAndLabels("pulsar_storage_backlog_age_seconds",
                            Pair.of("topic", topic1));
            assertThat(backlogAgeMetric.tags).containsExactly(
                    entry("cluster", CLUSTER_NAME),
                    entry("namespace", namespace),
                    entry("topic", topic1));
            assertThat((long) backlogAgeMetric.value).isCloseTo(expectedMessageAgeSeconds, within(2L));

Add the method you lack at Metrics

private RestEndpointMetricsFilter(PulsarService pulsar) {
PulsarBrokerOpenTelemetry telemetry = pulsar.getOpenTelemetry();
Meter meter = telemetry.getMeter();
latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please look at #22058 to understand how to do:
Instrument name, description, unit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still relevant :)

@asafm
Copy link
Contributor

asafm commented Mar 10, 2024

@dao-jun I'm trying to find a more elegant way to obtain the path template in Jersey DEV mailing list: https://www.eclipse.org/lists/jersey-dev/msg00370.html

@asafm
Copy link
Contributor

asafm commented Mar 10, 2024

@dao-jun Ping me when you the PR is ready to continue review.

@dao-jun
Copy link
Member Author

dao-jun commented Mar 11, 2024

@asafm I've updated the PR, PTAL

@dao-jun
Copy link
Member Author

dao-jun commented Mar 12, 2024

@dragosvictor Could you please take a look that why my test keeps failing?

private RestEndpointMetricsFilter(PulsarService pulsar) {
PulsarBrokerOpenTelemetry telemetry = pulsar.getOpenTelemetry();
Meter meter = telemetry.getMeter();
latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still relevant :)

Meter meter = openTelemetry.getMeter();
latency = meter.histogramBuilder("pulsar_broker_rest_endpoint_latency")
.setDescription("Latency of REST endpoints in Pulsar broker")
.setUnit("ms")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.build();
}

private static volatile RestEndpointMetricsFilter instance;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be static? Can't you just created one instance of the filter and register it?

UriRoutingContext info = (UriRoutingContext) req.getUriInfo();
attrs = getRequestAttributes(info, statusCode);
} catch (Throwable ex) {
attrs = Attributes.of(PATH, "UNKNOWN", METHOD, req.getMethod(), CODE, (long) statusCode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that yet.


Object o = req.getProperty(REQUEST_START_TIME);
if (o instanceof Long start) {
long duration = System.currentTimeMillis() - start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read online now, and from from what I can gather, it's not recommended to use System.currentTimeMillis(). It's a native call to obtain the wall clock time. NTP for example can sync the clock and change. Not 100% sure but leap second may also change it. See: https://develotters.com/posts/how-not-to-measure-elapsed-time/

I think it is safer to use System.nanoTime()

UriRoutingContext info = (UriRoutingContext) req.getUriInfo();
attrs = getRequestAttributes(info, statusCode);
} catch (Throwable ex) {
attrs = Attributes.of(PATH, "UNKNOWN", METHOD, req.getMethod(), CODE, (long) statusCode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good practice. For example OOME will vanish like that.
I think if we don't have any specific exception in mind, no point in guarding this part.

if (CollectionUtils.isEmpty(templates)) {
return Attributes.of(PATH, "UNKNOWN", METHOD, httpMethod, CODE, statusCode);
}
UriTemplate[] arr = templates.toArray(new UriTemplate[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you specifically need to convert this to an array? Can't you just iterate over the list using get(i)?

UriTemplate[] arr = templates.toArray(new UriTemplate[0]);
int idx = arr.length - 1;
StringBuilder builder = new StringBuilder();
for (; idx >= 0; idx--) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for (int idx = arr.length - 1; ...?

admin.tenants().deleteTenant("test");

Collection<MetricData> metricDatas = pulsarTestContext.getOpenTelemetryMetricReader().collectAllMetrics();
log.info("Metrics size: {}", metricDatas.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed at this stage


Collection<MetricData> metricDatas = pulsarTestContext.getOpenTelemetryMetricReader().collectAllMetrics();
log.info("Metrics size: {}", metricDatas.size());
Optional<MetricData> optional = metricDatas.stream().peek(m -> log.info("metric name: {}", m.getName()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
@lhotari lhotari modified the milestones: 4.0.0, 4.1.0 Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker area/metrics doc-required Your PR changes impact docs and you will update later. ready-to-test release/4.0.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants