Skip to content

Usage of hiredis.Reader in asyncio leads to cyclic reference, causing a memory leak #2693

Closed
@oranav

Description

@oranav

Version: redis-py 4.5.4, Redis version irrelevant

Platform: macOS 13.2.1

Description: There's an inherent memory leak in redis.asyncio.connection.HiredisParser. This is actually hiredis-py's fault, but I suggest that we implement stuff a bit different to workaround this issue until redis/hiredis-py#163 is merged and a new version of hiredis-py is released.

This is a result of #2557.

Inside HiredisParser.on_connect, an hiredis.Reader object is constructed with replyError set to self.parse_error:

kwargs: _HiredisReaderArgs = {
"protocolError": InvalidResponse,
"replyError": self.parse_error,
}
if connection.encoder.decode_responses:
kwargs["encoding"] = connection.encoder.encoding
kwargs["errors"] = connection.encoder.encoding_errors
self._reader = hiredis.Reader(**kwargs)

At this point, we have the HiredisParser object pointing to the hiredis.Reader object (through _reader).

hiredis.Reader.__init__ reads replyError and sets it within self.replyErrorClass:

https://github.com/redis/hiredis-py/blob/8adb1b3cb38b82cdc73fa2d72879712da1f74e70/src/reader.c#L288

Now we have a cyclic reference:

HiredisParser._reader references to hiredis.Reader;
hiredis.Reader.replyErrorClass references to HiredisParser.parse_error (which is a bound method, so its references to self).

However, hiredis.Reader doesn't support garbage collection!

https://github.com/redis/hiredis-py/blob/8adb1b3cb38b82cdc73fa2d72879712da1f74e70/src/reader.c#L47

It doesn't declare Py_TPFLAGS_HAVE_GC.

You can also see that it doesn't implement tp_traverse:
https://github.com/redis/hiredis-py/blob/8adb1b3cb38b82cdc73fa2d72879712da1f74e70/src/reader.c#L49

So these objects are never garbage collected, leading to a memory leak.

The non-asyncio variant doesn't suffer from this problem, as its HiredisParser.on_disconnect deletes the reference to hiredis.Reader, breaking the cycle. #2557 removed this from the async variant, which has introduced the problem.

The correct solution is to support GC in hiredis-py, as it can hold references to non-primitive objects. Until a new version is released, I suggest to implement a temporary workaround.

The following is a simple reproduction script:

import asyncio
import gc

import hiredis  # Make sure it's installed
import redis.asyncio
import redis.asyncio.connection


async def a():
    r = redis.asyncio.Redis.from_url("redis://localhost:6379/0")
    await r.get("a")
    await r.close()

asyncio.run(a())
gc.collect()

for obj in gc.get_objects():
    if isinstance(obj, redis.asyncio.connection.BaseParser):
        print(obj)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions