Description
Our Kafka instrumentation currently creates a transaction for each individual message. This can lead to a flood of transactions in some use cases, e.g. if an application uses event sourcing and fetches entire kafka topics during startup.
@eyalkoren explained this in detail in an internal issue:
It was based on the assumption that for these types of usages, Kafka streams will be used and not Kafka clients and it proved to be valid so far. The challenge with messaging systems is that they may be used either as a way of asynchronous communication between services or for this type of data pipelining, and the default desired behavior is very different in each case. This also affects what you want to do with regards to distributed tracing: it is a crucial APM functionality if use it as a communication facility and it is irrelevant if you use it to process mass amount of records.
Knowing that in some cases creating a transaction per batch is preferable, we introduced the internal message_batch_strategy, however we did not apply it to our Kafka instrumentation so far. I think the best way to deal with it now is to it, for example- change our iteration instrumentation so that it starts a transaction when the iterator is created and close it when it is empty, and let the customer try with this config option.
Going forward, we should handle our defaults so that we guard against this type of scenarios. For example:
- if batchSize <= 10
- if message_batch_strategy==single_handling - create transaction per message
- else - create transaction per batch
- else if 10 < batchSize <= 100
- create transaction per batch and use span links if messages contain traceparent header
- else - create transaction per batch without span links
We later on discovered that message_batch_strategy
defaults to batch_handling
, therefore this can't be respected by default for kafka. However, the general idea might be interesting: Automatically capture the processing of large batches automatically as a single transaction and smaller batches with a transaction per message. "Large" could be configurable threshold.
An alternative is for example to instrument the higher-level frameworks using kafka and do the decision based on the knowledge from the higher level framework. E.g. for spring we already create a single transaction for @KafkaListener(batch=true)
.