Description
Version: redis-py 4.5.4, Redis version irrelevant
Platform: macOS 13.2.1
Description: There's an inherent memory leak in redis.asyncio.connection.HiredisParser
. This is actually hiredis-py
's fault, but I suggest that we implement stuff a bit different to workaround this issue until redis/hiredis-py#163 is merged and a new version of hiredis-py
is released.
This is a result of #2557.
Inside HiredisParser.on_connect
, an hiredis.Reader
object is constructed with replyError
set to self.parse_error
:
redis-py/redis/asyncio/connection.py
Lines 368 to 376 in e1017fd
At this point, we have the HiredisParser
object pointing to the hiredis.Reader
object (through _reader
).
hiredis.Reader.__init__
reads replyError
and sets it within self.replyErrorClass
:
https://github.com/redis/hiredis-py/blob/8adb1b3cb38b82cdc73fa2d72879712da1f74e70/src/reader.c#L288
Now we have a cyclic reference:
HiredisParser._reader
references to hiredis.Reader
;
hiredis.Reader.replyErrorClass
references to HiredisParser.parse_error
(which is a bound method, so its references to self
).
However, hiredis.Reader
doesn't support garbage collection!
https://github.com/redis/hiredis-py/blob/8adb1b3cb38b82cdc73fa2d72879712da1f74e70/src/reader.c#L47
It doesn't declare Py_TPFLAGS_HAVE_GC
.
You can also see that it doesn't implement tp_traverse
:
https://github.com/redis/hiredis-py/blob/8adb1b3cb38b82cdc73fa2d72879712da1f74e70/src/reader.c#L49
So these objects are never garbage collected, leading to a memory leak.
The non-asyncio variant doesn't suffer from this problem, as its HiredisParser.on_disconnect
deletes the reference to hiredis.Reader
, breaking the cycle. #2557 removed this from the async variant, which has introduced the problem.
The correct solution is to support GC in hiredis-py
, as it can hold references to non-primitive objects. Until a new version is released, I suggest to implement a temporary workaround.
The following is a simple reproduction script:
import asyncio
import gc
import hiredis # Make sure it's installed
import redis.asyncio
import redis.asyncio.connection
async def a():
r = redis.asyncio.Redis.from_url("redis://localhost:6379/0")
await r.get("a")
await r.close()
asyncio.run(a())
gc.collect()
for obj in gc.get_objects():
if isinstance(obj, redis.asyncio.connection.BaseParser):
print(obj)