Skip to content

Pump-lifetime span (MessagePumpSpanOperation.Begin) recorded into messaging.client.operation.duration histogram #4086

Description

@thomhurst

Summary

Reactor.Run and Proactor.EventLoop open a single pumpSpan via Tracer?.CreateMessagePumpSpan(MessagePumpSpanOperation.Begin, ...) at startup and close it only when the pump exits. That span's lifetime equals the consumer worker's lifetime — potentially hours or days.

BrighterMetricsFromTracesProcessor.OnEnd then records that lifetime into the messaging.client.operation.duration histogram, because the switch on messaging.operation.type falls through default for "begin".

Affected sites

  • src/Paramore.Brighter.ServiceActivator/Reactor.cs:97 (open) and :358 (close).
  • src/Paramore.Brighter.ServiceActivator/Proactor.cs:138 (open) and the matching close at the end of EventLoop.
  • src/Paramore.Brighter/Observability/BrighterMetricsFromTracesProcessor.cs:60-76:
switch (operation)
{
    case "publish": ...
    case "receive": ...
    case "process": ...
    default:
        messagingMeter.RecordClientOperation(activity);
        break;
}

The "begin" operation (from MessagePumpSpanOperation.Begin.ToSpanName() at src/Paramore.Brighter/Observability/BrighterSpanExtensions.cs:69) is the only operation that hits this default arm.

Symptom

  • A massive outlier appears in the messaging.client.operation.duration histogram every time a consumer pump shuts down — value equal to the pump's wall-clock lifetime (minutes to hours).
  • Histogram percentiles (p50/p95/p99) for messaging.client.operation.duration are skewed and unusable for SLOs.

Proposed fix

Either:

  1. Drop the default arm so unknown operations are not counted as client operations.
  2. Special-case "begin" to no-op (or record into a different, optional pump-lifecycle gauge if one is wanted).

Option 1 is the safer minimum: it preserves the explicit publish/receive/process recording paths and stops unrelated operations from polluting the histogram. The MessagePumpSpanOperation.Begin span is still emitted as a trace span — it just shouldn't feed the client-operation duration metric.

Acceptance

  • Pump shutdown does not emit a multi-hour value into messaging.client.operation.duration.
  • Existing publish / receive / process recording behaviour is unchanged.
  • A test asserts that ending a MessagePumpSpanOperation.Begin activity does not invoke IAmABrighterMessagingMeter.RecordClientOperation.

Context

Surfaced while reviewing PR #4065 — pre-existing, unrelated to that PR.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions