Description
Expected behavior
When the Redis container is paused (not stopped), the connection attempt should fail, triggering the retry mechanism. The retry number should increase monotonically until until the specified maximum number of retries is reached.
Example logs of expected behavior:
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 2/5. Backing off for 1.0 seconds
INFO - Attempt 3/5. Backing off for 2.0 seconds
INFO - Attempt 4/5. Backing off for 4.0 seconds
The print statement was added at the end of the following except
block:
Lines 60 to 70 in ea01a30
Actual behavior
Instead of progressing through the retry attempts, the retry mechanism gets stuck at the first attempt, repeating indefinitely.
Example logs of actual behavior:
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 1/5. Backing off for 0.5 seconds
INFO - Attempt 1/5. Backing off for 0.5 seconds
Root Cause
The issue occurs because the sock.connect
(line 575 of the _connect
method) succeeds even when the container is paused
. However, subsequent read operations fail with Timeout
.
Lines 728 to 763 in ea01a30
Possible solution
To properly detect when the connection is truly established, we can send a PING
command immediately after connect()
and verify the response.
Add the following after sock.connect()
to ensure the connection is functional:
ping_parts = self._command_packer.pack("PING")
for part in ping_parts:
sock.sendall(part)
response = sock.recv(7)
if not str_if_bytes(response).startswith("+PONG"):
raise OSError(f"Redis handshake failed: unexpected response {response!r}")
Additional Comments
-
There may be a better way to handle the read operation for the
PING
response using existing methods, but calling_send_ping
directly does not work in this case. -
This issue also affects the asynchronous version of
redis-py
.
Let me know if you'd like a clearer example to reproduce the behavior.