-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the bug
Firstly I am aware that mirrored classic queues are about to be removed. Opening this issue mostly to document the observed behaviour.
Given a mirrored classic queue with max-length[-bytes] and overflow: reject-publish. When the queue is full and a client with a long-lived connection and publisher-confirms disabled, publishes messages to this queue, the memory of the slave processes grow continuously eventually leading to an OOM. (The memory is released when the connection is closed)
What happens is that the channel process sends published messages to both the master and slave processes of the queue, and the slaves temporarily store them in the sender_queues field of their state (maybe_enqueue_message). When the queue is full and publisher-confirms are enabled the master also broadcasts a discard message to the slaves (in send_reject_publish) which removes the message from the sender_queue (in publish_or_discard). However if publisher-confirms are disabled it does not send anything to the slaves (in send_reject_publish), so the sender_queues structure is growing indefinitely.
We speculate that the issue also exists if the messages are published to the mirrored queue via dead-lettering.
Reproduction steps
- Create a multi-node cluster for example on 3.12.6 (I tested on main 09a95a5)
- Create a policy for all classic-queues with
ha-mode: all - Create a classic queue with
max-length: 10andoverflow: reject-publish, leader node being rabbit-1 - Open an AMQP connection and publish messages to the queue continuously (without enabling publisher-confirms). The queue will have 10 messages. The memory on rabbit-1 and the memory of the queue master process remains stable. However the memory on rabbit-2 and rabbit-3 and the process memory of the queue slave processes will continually grow.
...
Expected behavior
The memory of the queue slave processes remains stable.
Additional context
Because the variable_queue:discard is a noop, apart from mirrored classic queues this issue of a missing discard call probably does not affect any of the queues included in RabbitMQ.
However it might affect queue types provided by community plugins that are based on the classic queue (non-mirrored). As the example of the message deduplication plugin shows there might be some plugins that make use of the discard callback. (noxdafox/rabbitmq-message-deduplication#96) Hence I think it is worth considering including a fix (which I'm willing to submit)