-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat][broker] PIP-264: Add topic messaging metrics #22467
[feat][broker] PIP-264: Add topic messaging metrics #22467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good.
My only comment is that we should be using qualified names in the attributes:
pulsar.namespace Includes the namespace portion only my-namespace
pulsar.topic Includes the local topic name my-topic
eg: pulsar.namespace
-> my-tenant/my-namespace
pulsar.topic
-> my-tenant/my-namespace/my-topic
This is to keep consistency with many other places in APIs and CLI tools. Also, consistent with OTel metrics attributes in the client SDK.
Note that such a transformation would lead the namespace to occasionally include the "cluster" portion in case of old topic names: The topic name, as is currently filled in by the client also includes the persistence part: If we can confirm that this is the intent here, I can go ahead and proceed with this proposal. Alternatively, we can augment the existing implementation with one more attribute, say |
@dragosvictor I think we shouldn't worry about v1 topic names, these were deprecated long ago we should actually start to get rid of them completely.
Yes, it's better to include, because it's part of fully-qualified name. |
This reverts commit bcfb1da.
…merBacklogEvictionTimeQuota
7d64327
to
f8f2e9a
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #22467 +/- ##
============================================
- Coverage 73.57% 72.47% -1.10%
+ Complexity 32624 32374 -250
============================================
Files 1877 1886 +9
Lines 139502 140783 +1281
Branches 15299 15456 +157
============================================
- Hits 102638 102037 -601
- Misses 28908 30872 +1964
+ Partials 7956 7874 -82
Flags with carried forward coverage won't be shown. Click here to find out more.
|
if (topicName.isPartitioned()) { | ||
builder.put(OpenTelemetryAttributes.PULSAR_PARTITION_INDEX, topicName.getPartitionIndex()); | ||
} | ||
var attributes = builder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dragosvictor @merlimat Please note this contradicts all the work done to eliminate memory allocation.
The Attributes for each topic should be persisted either in a Map or easier by saving it in the implementation of Topic
interface you're getting. Then you simply retrieve it by topic.getTopicOTelAttributes()
.
When you are not doing it, you are creating a memory allocation each time you run collect, for each topic. This is exactly what I was set out to avoid in the whole implementation rework of OpenTelemetry Java SDK memory mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asafm can you please create an issue to track this ?
PIP-264
Motivation
This PR adds OpenTelemetry metrics for the topic messaging group. The aim is to replace the Prometheus metrics listed in the reference documentation.
Modifications
Added the following general topic attributes (sample output relative to topic
persistent://my-tenant/my-namespace/my-topic-partition-2
):pulsar.domain
persistent
pulsar.tenant
my-tenant
pulsar.namespace
my-tenant/my-namespace
pulsar.topic
persistent://my-tenant/my-namespace/my-topic
pulsar.partition.index
2
Added attributes
pulsar.transaction.status
(differentiates betweencommitted
,aborted
andactive
txns)pulsar.backlog.quota.type
(differentiates betweensize
andtime
based quotas)pulsar.compaction.status
(differentiates betweensuccess
andfailure
of compaction operation result)Added class
OpenTelemetryTopicStats
. This is where the metrics are being defined (name, type, unit, description). All added metrics are 'asynchronous': OpenTelemetry uses a callback mechanism to retrieve their values, instead of internally maintaining the counters itself. Thus, we can leverage the existing counters stored throughout Pulsar. It also allows us to 'remove' a topic from OpenTelemetry tracking by simply not recording any metrics for it during a callback op (contrast that to synchronous OpenTelemetry metrics that cannot be 'closed', staying resident in memory). To further improve performance, the PR batches together updates for these metrics, meaning we can record all metrics relevant to a topic in one callback operation.For the metric definitions, please consult [feat][site] PIP-264: Document topic messaging metrics pulsar-site#880
Many of the Prometheus metrics exposed both rates and absolute counters of their values. The rates would be redundant in OpenTelemetry, and are excluded here.
In the interest of keeping the size of this PR manageable, a few metrics have been omitted and will be added separately:
pulsar_delayed_message_*
. Replacement for metricpulsar_consumer_msg_ack_rate
is defined in the PR but not currently recorded, as more work is needed to fetch this information efficiently. Metrics of typeHistogram
will only be populated at the namespace level, and are thus excluded here:pulsar_storage_write_latency_le_*
,pulsar_entry_size_le_*
,pulsar_compaction_latency_le_*
,pulsar_delayed_message_index_bucket_op_latency_ms
.A couple of metrics did not have a proper backend for reporting:
getAddEntrySucceedTotal
andgetReadEntriesSucceededTotal
toManagedLedgerMXBean
, supporting metricspulsar.broker.topic.storage.outgoing
andpulsar.broker.topic.storage.incoming
.AbstractTopic.totalPublishRateLimitedCounter
, reporting back the total hit number for the publish rate limiter.CompactionRecord.compactionReadBytes
andCompactionRecord.compactionWriteBytes
, supporting metricspulsar.broker.topic.compaction.incoming
andpulsar.broker.topic.compaction.outgoing
.Verifying this change
This change added tests and can be verified as follows:
(example:)
OpenTelemetryTopicStatsTest#testMessagingMetrics
to validate basic messaging metrics: subscription/producer/consumer counts, message in/out counts, bytes in/out counts etc.OpenTelemetryTopicStatsTest#testPublishRateLimitMetric
to validate rate limiter metricTransactionTest#testTopicTransactionMetrics
to validate OpenTelemetry topic metrics tooDelayedDeliveryTest#testDelayedDelivery
to validate topic delayed delivery metricCompactorTest#testAllCompactedOut
to validate message compaction metricsBacklogQuotaManagerTest#testConsumerBacklogEvictionTimeQuota
andBacklogQuotaManagerTest#testConsumerBacklogEvictionSizeQuota
to validate eviction backlog metricsAdminApiOffloadTest#testOffload
to validate storage offload metricDoes this pull request potentially affect one of the following parts:
Documentation
doc
doc-required
: Documentation PR: [feat][site] PIP-264: Document topic messaging metrics pulsar-site#880doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: dragosvictor#17