Open
Description
I'm experiencing many issues (already reported here by other people) related with the (blocking) connection pool in the asyncio version.
I had to rollback and pin my dependency version to 4.5.5 currently. (lablup/backend.ai#1620)
Since there are already many reports, here I would instead suggest some high-level design suggestions:
- The redis-py users should be able to distinguish:
- The connection is closed by myself.
- The connection is actively closed by the server after sending responses.
- The connection is abruptly closed by the server without sending responses.
- The client is evicted due to a server-side limit.
- Improvements for timeouts
- Distinguish:
- connection ready timeout in the blocking connection pool
- socket connection timeout
- response timeout
- These timeouts should be treated differently in the intrinsic retry subsystem and distinguishable by subclasses of
ConnectionError
and/orTimeoutError
to ease writing user-defined retry mechanisms.- e.g., Within the connection ready timeout, we should silenty wait and retry until we get a connection from a blocking connection pool.
- The response timeout for non-blocking (
GET
,SET
, ...) and blocking commands (BLPOP
,XREAD
, ...) should be considered a different condition: active error vs. polling.- It would be nice to have explicit examples/docs on how to write a polling loop around blocking commands with proper timeout and retry handling.
- Please refer: https://github.com/lablup/backend.ai/blob/833ed5477d57846e568b17fec35c82300111a519/src/ai/backend/common/redis_helper.py#L174
- Blocking command
timeout
cannot exceed client'ssocket_timeout
#2807 - Retrying not working when executing Redis Pipeline for retry_on_error exceptions #2973
- With blocking xread, redis connection can be released while still waiting for response #2663
- Maybe we could refer the design of
aiohttp.ClientTimeout
.
- Distinguish:
BlockingConnectionPool
should be the default.ConnectionPool
's defaultmax_connections
should be a more reasonable number.- Use a blocking connection pool in the async RedisCluster instead of managing connections synchronously #2522
- BlockingConnectionPool does not recover if redis disconnects. #3034
- BlockingConnectionPool deadlock (double condition.acquire) #3056
- There are issues to resolve first, though...
- "No connection available" errors since 5.0.1 #2995
- Instantiating BlockingConnectionPool.from_url with timeout in query args fails #2983
- Refactor how asyncio.BlockingConnectionPool gets connection. #2998
- Remove idle connections periodically #2992
- Error-prone behaviour of SELECT command in combination with connection pool #3124
- Better connection pool design and abstraction to embrace underyling transport type differences and errors with connections
- Async redis subscription inside websocket doesn't shut down properly #2523
- "Connection closed by server." #2773
- redis's connection pool makes me wonder #2832
- Exceptions, redis and asyncio #2727
- redis.exceptions.ConnectionError with 4.5.2 #2636
- Optionally disable disconnects in read_response #2695
- Fix ConnectionPool deadlock triggered by gc #3000 (though this is a thread usecase)
- Stderr backtrace caused by
redis.client.Redis.__del__
on process exit. #3014 - Why warn on e.g. "Unclosed RedisCluster client"? #3026
- Redis connection release after reset #3043
- The sentinel's connection pool should also have the blocking version.
- Currently there is no
BlockingSentinelConnectionPool
. - Rotating Slaves in Sentinel fails when hitting last slave in SentinelConnectionPool. #2956
- We need to clearly define whether the delay after connecting to the sentinel but before connecting to the target master/slave is included in the socket connection timeout or not.
- Currently there is no
- The new
CLIENT SETINFO
mechanism should be generalized.- Add support for new redis command CLIENT SETINFO #2682
- What if a failure occurs during sending this command?
- In my test cases which test retries with a sentinel master failover or a redis server restart, this
CLIENT SETINFO
breaks the retry semantic. - Could we enforce the intrinsic/user-defined retry on such event?
- In my test cases which test retries with a sentinel master failover or a redis server restart, this
- What if a user wants to insert additional commands like
CLIENT SETNAME
orCLIENT NO-EVICT on
? - Maybe we could refactor it as a connection-establishment callback, whose failure is ignored before disposing the faulty connection.
- Backport policy for bug fixes
- Since I had to rollback to 4.5.5 after trying 4.6.0 → 5.0.1 upgrade with connection leak experience, it would be nice to have a backport policy for critical bug fixes.
Metadata
Metadata
Assignees
Labels
No labels