The instrumentation of messaging systems includes:
- Transaction creation for received message processing, either as a child of the message sending span if the sending operation is traced (meaning distributed tracing support), or as root
- Span creation for message sending operation
- Span creation for message receiving operation that occurs within a traced transaction
A Message send/publish SHOULD be captured as a messaging
span only if
occurring within a traced transaction.
If the messaging system exposes a mechanism for sending additional message metadata, it
SHOULD be used to propagate the Trace Context of the
messaging
span, to continue the distributed trace.
Message reception can be divided into two types, passive and active.
Passive message reception typically involves a listener or callback subscribed to a queue/topic/subscription that is called once a message or batch of messages is available, and passed the message or batch of messages for processing.
Passive reception SHOULD be captured as a messaging
transaction that starts when
a message or batch of messages is received for processing, and ends when the
processing of the message or batch of messages finishes.
Active message reception typically involves initiating the reception of a message or batch of messages from a queue/topic, on demand. This might be a blocking or non-blocking operation, and for the purposes of this spec, is collectively referred to as polling.
When message polling is performed within a traced transaction, it SHOULD be
captured in a messaging
span with a name indicating a poll action. A
span SHOULD be created irrespective of whether the operation returns any messages.
If message polling is performed when there is no active transaction, it
can be the initiating event for a message processing flow. Our goal in this
scenario is to capture the message processing flow as a messaging
transaction
that SHOULD start after the polling operation exits and MUST only be started
if the polling operation returned a message or batch of messages. Capturing
the message processing flow in a transaction in this manner may be unfeasible if
the processing flow is not implemented within a well defined API.
When active or passive message reception results in receiving a batch of messages,
a messaging
transaction SHOULD be started and ended for the processing
of each message in the batch, if possible. For example, the Java agent
instruments the Kafka consumer batch iterator so that a transaction is started whenever
a message is retrieved from the batch, and ended either when the next message
is retrieved, or when the iterator is depleted i.e. iterator.hasNext()
returns
false
.
If creating a transaction for the processing of each message in a batch is not possible,
the agent SHOULD create a single messaging
transaction for the processing of the batch
of messages.
This section applies to messaging systems that support message metadata. The instrumentation of message reception SHOULD check message metadata for the presence of Trace Context.
When single message reception is captured as a messaging
transaction,
and a Trace Context is present, it SHOULD be used as the parent of the messaging
transaction
to continue the distributed trace.
Otherwise (a single message being captured as a messaging
span, or a batch
of messages is processed in a single messaging
transaction or span), a
span link SHOULD be added for each message with Trace Context.
This includes the case where the size of the batch of received messages is one.
The number of events processed for trace context SHOULD be limited to a maximum
of 1000, as a guard on agent overhead for extremely large batches of events.
(For example, SQS's maximum batch size is 10000 messages. The maximum number of
span links that could be sent for a single transaction/span to APM server with
the default configuration is approximately 4000: 307200 bytes
APM server max_event_size
default
/ 77 bytes per serialized span link.)
A Kafka consumer consumes data in a loop from one or more topics, as part of a
consumer group. A Consume()
method retrieves records one-by-one for processing.
Upon entering a consume loop, an APM agent SHOULD check if there is an active
messaging
transaction for consumption, and end it. Following a Consume()
method returning a message, an APM agent SHOULD start a messaging
transaction
to capture the message handling flow. Upon exiting a consume loop, an APM agent SHOULD check if there is an active messaging
transaction for consumption, and end it.
- Transactions:
transaction.type
:messaging
- Spans:
span.type
:messaging
span.subtype
: depends on service/provider, see table belowspan.action
:send
,receive
orpoll
subtype |
Description |
---|---|
azurequeue |
Azure Queue |
azureservicebus |
Azure Service Bus |
jms |
Java Messaging Service |
kafka |
Apache Kafka |
rabbitmq |
RabbitMQ |
sns |
AWS Simple Notification Service |
sqs |
AWS Simple Queue Service |
Transaction and span names should follow this pattern: <MSG-FRAMEWORK> SEND/RECEIVE/POLL to/from <QUEUE-NAME>
.
Examples:
JMS SEND to MyQueue
RabbitMQ RECEIVE from MyQueue
**RabbitMQ POLL from MyExchange
**
Agents may deviate from this pattern, as long as they ensure a proper cardinality is maintained, that is- neither too low nor too high. For example, agents may choose to name all transactions/spans reading-from/sending-to temporary queues equally. On the other end, agents may choose to append a cardinality-increasing factor to the name, like the routing key in RabbitMQ.
* At least up to version 1.19.0, the Java agent's instrumentation for Kafka does not follow this pattern.
In RabbitMQ, queues are only relevant in the receiver side, so the exchange name is used instead for sender spans.
When the default exchange is used (denoted with an empty string), it should be replaced with <default>
.
Agents may add an opt-in config to append the routing key to the name as well, for example: RabbitMQ RECEIVE from MyExchange/58D7EA987
.
For RabbitMQ transaction and polling spans, the queue name is used, whenever available (i.e. when the polling yields a message).
RabbitMQ broker can generate a unique queue name on behalf of an application, which conforms to a naming convention starting with
amq.gen-
. A generated queue name MUST be replaced with amq.gen-*
when used in a span name, to reduce cardinality.
context.message.queue.name
: optional formessaging
spans and transactions. Indexed as keyword. Wherever the broker terminology uses "topic", this field will contain the topic name. In RabbitMQ, whenever the queue name is not available, use exchange name instead.context.message.age.ms
: optional for message/record receivers only (transactions or spans). A numeric field indicating the message's age in milliseconds. Relevant for transactions andreceive
spans that receive valid messages. There is no accurate definition as to how this is calculated. If the messaging framework provides a timestamp for the message, agents may use it (subtract the message/record timestamp from the read timestamp). If a timestamp is not available, agents should omit this field or find an alternative and document it in this spec. For example, the sending agent can add a timestamp to the message's metadata to be retrieved by the receiving agent. Clock skews between agents are ignored, unless the calculated age (receive-timestamp minus send-timestamp) is negative, in which case the agent should report 0 for this field.context.message.routing_key
: optional. Use only where relevant. Currently only RabbitMQ.
context.message.body
: similar to HTTP requests'context.request.body
- only fill in messaging-related transactions (ie incoming messages creating a transaction) and not for outgoing messaging spans.- Capture only when
ELASTIC_APM_CAPTURE_BODY
config is set to'all'
or'transactions'
. - Only capture UTF-8 encoded message bodies.
- Limit size to 10000 characters. If longer than this size, trim to 9999 and append with ellipsis
- Capture only when
context.message.headers
: similar to HTTP requests'context.request.headers
- only fill in messaging-related transactions.- Capture only when
ELASTIC_APM_CAPTURE_HEADERS
config is set totrue
. - Sanitize headers with keys configured through
ELASTIC_APM_SANITIZE_FIELD_NAMES
. - Intake: key-value pairs, same like
context.request.headers
.
- Capture only when
context.service.framework.name
: same asspan.subtype
, but not in lowercase, e.g.Kafka
,RabbitMQ
context.destination.address
: optional. Not available in some cases. Only set if the actual connection is available.context.destination.port
: optional. Not available in some cases. Only set if the actual connection is available.context.destination.service.resource*
: mandatory. Value should be either${span.subtype}/${context.mesage.queue.name}
or${span.subtype}
.context.destination.service.type*
: deprecated but mandatory until 7.14.0. See destination spec for details.context.destination.service.name
: deprecated but mandatory until 7.14.0. See destination spec for details.context.service.target.type
: mandatory. Same value asspan.subtype
. See Service Target for details.context.service.target.name
: optional. same value ascontext.mesage.queue.name
if available. See Service Target for details.
Used to filter out specific messaging queues/topics/exchanges from being traced.
This property should be set to a list containing one or more wildcard matcher strings. When set, sends-to and receives-from the specified queues/topics/exchanges will be ignored.
Type | List< WildcardMatcher > |
Default | empty |
Dynamic | true |
Central config | true |
To support distributed tracing with automatic instrumentation, the messaging system must provide a mechanism to add metadata/properties/attributes to individual messages, akin to HTTP headers. If an APM agent supports trace-context for a given messaging system, it MUST use the following mechanisms so that cross-language tracing works:
Messaging system | Mechanism |
---|---|
Azure Queue | No mechanism |
Azure Service Bus | Diagnostic-Id application property. See this doc. |
Java Messaging Service | Message Properties |
Apache Kafka | Kafka Record headers using binary trace context fields |
RabbitMQ | Message Attributes (a.k.a. AMQP.BasicProperties in Java API) |
AWS SQS | SQS message attributes, if within message attribute limits. See AWS instrumentation. |
AWS SNS | SNS message attributes, if within message attribute limits. See AWS instrumentation. |
The instrumentation of SQS and SNS services generally follow this spec, with some nuances specified in the linked specs.