Summary
Reactor.Run and Proactor.EventLoop open a single pumpSpan via Tracer?.CreateMessagePumpSpan(MessagePumpSpanOperation.Begin, ...) at startup and close it only when the pump exits. That span's lifetime equals the consumer worker's lifetime — potentially hours or days.
BrighterMetricsFromTracesProcessor.OnEnd then records that lifetime into the messaging.client.operation.duration histogram, because the switch on messaging.operation.type falls through default for "begin".
Affected sites
src/Paramore.Brighter.ServiceActivator/Reactor.cs:97 (open) and :358 (close).
src/Paramore.Brighter.ServiceActivator/Proactor.cs:138 (open) and the matching close at the end of EventLoop.
src/Paramore.Brighter/Observability/BrighterMetricsFromTracesProcessor.cs:60-76:
switch (operation)
{
case "publish": ...
case "receive": ...
case "process": ...
default:
messagingMeter.RecordClientOperation(activity);
break;
}
The "begin" operation (from MessagePumpSpanOperation.Begin.ToSpanName() at src/Paramore.Brighter/Observability/BrighterSpanExtensions.cs:69) is the only operation that hits this default arm.
Symptom
- A massive outlier appears in the
messaging.client.operation.duration histogram every time a consumer pump shuts down — value equal to the pump's wall-clock lifetime (minutes to hours).
- Histogram percentiles (p50/p95/p99) for
messaging.client.operation.duration are skewed and unusable for SLOs.
Proposed fix
Either:
- Drop the
default arm so unknown operations are not counted as client operations.
- Special-case
"begin" to no-op (or record into a different, optional pump-lifecycle gauge if one is wanted).
Option 1 is the safer minimum: it preserves the explicit publish/receive/process recording paths and stops unrelated operations from polluting the histogram. The MessagePumpSpanOperation.Begin span is still emitted as a trace span — it just shouldn't feed the client-operation duration metric.
Acceptance
Context
Surfaced while reviewing PR #4065 — pre-existing, unrelated to that PR.
Summary
Reactor.RunandProactor.EventLoopopen a singlepumpSpanviaTracer?.CreateMessagePumpSpan(MessagePumpSpanOperation.Begin, ...)at startup and close it only when the pump exits. That span's lifetime equals the consumer worker's lifetime — potentially hours or days.BrighterMetricsFromTracesProcessor.OnEndthen records that lifetime into themessaging.client.operation.durationhistogram, because theswitchonmessaging.operation.typefalls throughdefaultfor"begin".Affected sites
src/Paramore.Brighter.ServiceActivator/Reactor.cs:97(open) and:358(close).src/Paramore.Brighter.ServiceActivator/Proactor.cs:138(open) and the matching close at the end ofEventLoop.src/Paramore.Brighter/Observability/BrighterMetricsFromTracesProcessor.cs:60-76:The
"begin"operation (fromMessagePumpSpanOperation.Begin.ToSpanName()atsrc/Paramore.Brighter/Observability/BrighterSpanExtensions.cs:69) is the only operation that hits thisdefaultarm.Symptom
messaging.client.operation.durationhistogram every time a consumer pump shuts down — value equal to the pump's wall-clock lifetime (minutes to hours).messaging.client.operation.durationare skewed and unusable for SLOs.Proposed fix
Either:
defaultarm so unknown operations are not counted as client operations."begin"to no-op (or record into a different, optional pump-lifecycle gauge if one is wanted).Option 1 is the safer minimum: it preserves the explicit
publish/receive/processrecording paths and stops unrelated operations from polluting the histogram. TheMessagePumpSpanOperation.Beginspan is still emitted as a trace span — it just shouldn't feed the client-operation duration metric.Acceptance
messaging.client.operation.duration.publish/receive/processrecording behaviour is unchanged.MessagePumpSpanOperation.Beginactivity does not invokeIAmABrighterMessagingMeter.RecordClientOperation.Context
Surfaced while reviewing PR #4065 — pre-existing, unrelated to that PR.