Description
Describe the bug
When sending RPC-style messages using 'spring-rabbit' RabbitTemplate.sendAndReceive(...)
with useDirectReplyToContainer = true
, a NullPointerException sometimes occurs on closing idle 'replay-to' channel in AbstractMetricsCollector.basicCancel(...)
due to a race condition between :
- the thread executing
ChannelN.basicCancel(...)
- the async thread handling the
Basic.CancelOk
RPC, which removesChannelState
inAbstractMetricsCollector.closeChannel()
beforeAbstractMetricsCollector.basicCancel(...)
is executed.
Details
When new 'reply-to' com.rabbitmq.client.Channel
is created, that is happening when we use RabbitTemplate.sendAndReceive(...)
, it is usually paired with MetricsCollector.newChannel()
call (which is an instance of MicrometerMetricsCollector
) and metrics collector register a shutdown listener that will call MetricsCollector.closeChannel()
.
When it is time to close idle consumer with underlying channel, in its turn ChannelN.basicCancel()
do RPC with Basic.Cancel
that is asynchronously handled by listener container and underlaying ChannelN.close()
executes shutdown listeners and by that running MetricsCollector.closeChannel()
that clears ChannelState
.
In parallel with RPC MetricsCollector.basicCancel()
can be called that trying to access ChannelState
, but at this point channel state can already been removed, that ends with logger message Error while computing metrics in basicCancel: Cannot read field "lock" because "channelState" is null
Reproduction steps
- Configure Spring Boot to use RabbitTemplate with useDirectReplyToContainer=true
- Send RPC-style messages using rabbitTemplate.sendAndReceive(...)
- Let the temp channel remain idle for channelRpcTimeout (e.g., 60s)
- Observe DirectReplyToMessageListenerContainer.reduceConsumersIfIdle() triggering a call to:
- RabbitUtils.cancel(...) → ChannelN.basicCancel(...)
- Occasionally, this will lead to: Error while computing metrics in basicCancel: Cannot read field "lock" because "channelState" is null
Expected behavior
The AbstractMetricsCollector.basicCancel(...)
should not throw an NPE — ideally, the metrics collector should be resilient to a missing ChannelState
, or the cancel flow should be ordered to avoid state removal before metrics are recorded.
Additional context
Here’s a detailed method chain, with emphasis on the race:
DirectReplyToMessageListenerContainer
└── reduceConsumersIfIdle()
└── cancelConsumer()
└── RabbitUtils.cancel()
└── ChannelN.basicCancel()
├── rpc(Basic.Cancel)
│
│ [ASYNC: RPC is handled here]
│ └── DirectReplyToMessageListenerContainer.handleCancelOk()
│ └── finalizeConsumer()
│ └── closeChannel()
│ └── RabbitUtils.closeChannel()
│ └── ChannelN.close()
│ └── notifyListeners()
│ └── AbstractMetricsCollector.closeChannel()
│ └── Clears ChannelState
│
└── AbstractMetricsCollector.basicCancel() ← NPE here
└── Clears ChannelState