Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use redis 4.6.0 without connection pool leakage in worker #873

Closed
birkjernstrom opened this issue Jul 19, 2023 · 3 comments · Fixed by #1040
Closed

Can't use redis 4.6.0 without connection pool leakage in worker #873

birkjernstrom opened this issue Jul 19, 2023 · 3 comments · Fixed by #1040
Labels
bug Something isn't working dependencies Pull requests that update a dependency file

Comments

@birkjernstrom
Copy link
Member

birkjernstrom commented Jul 19, 2023

Active Redis connections 2023-07-19 (CET)
image

What happened

  • Upgraded dependencies today in server: Use githubkit 0.10.6 + update dependencies #870
  • Redis was bumped from 4.5.5 to 4.6.0
  • Quickly ran into ConnectionError: max number of clients reached from polar.worker.create_pool
  • Our worker went down. Rolled back + increased our Redis connection limit to 1k (Render) at the same time (not necessary once the rollback was completed).
  • Tried deploying 4.6.0 again after lunch. Definition of insanity... But in my defence, initial deploy occurred during 1) our cron execution + 2) few large repositories triggering syncs. So I thought it might have been a timing issue.
  • Deployed server: Use fixed version of redis (4.5.5) #872 and saw a drop in active connections to normal levels

Actions

  • Fixed the version to 4.5.5 now
  • Creating this issue to ensure we look into it more closely before upgrading redis (not a priority today)

Funding

  • You can sponsor this specific effort via a Polar.sh pledge below
  • We receive the pledge once the issue is completed & verified
Fund with Polar
@birkjernstrom birkjernstrom added bug Something isn't working dependencies Pull requests that update a dependency file labels Jul 19, 2023
@birkjernstrom
Copy link
Member Author

Should note: Thanks to Redis (server) and arq, despite our worker being down for ~10m our queue kept building and once the worker was online again it powered through them all (short-term CPU spike)

@frankie567
Copy link
Member

Since you mentioned this issue in #868, I was curious to see what was going on 🙃 Not sure if it's the source of this specific problem but the following function is problematic IMO:

async def enqueue_job(name: str, *args: Any, **kwargs: Any) -> Job | None:
ctx = ExecutionContext.current()
kwargs["polar_context"] = PolarWorkerContext(
is_during_installation=ctx.is_during_installation,
)
redis = await create_pool()
return await redis.enqueue_job(name, *args, **kwargs)

A pool is created each time we call enqueue_job. So we might end up with leaking connection pools. A global connection pool should be opened/closed during the lifetime of the FastAPI app (the new Lifespan context is quite elegant for this kind of use-case) and passed to the function.


Maybe a coincidence but, in 4.6.0, they released the following fix redis/redis-py/pull/2755 which removes the __del__ destructors. Maybe it was saving us from leaking pools when the GC was passing?

@birkjernstrom
Copy link
Member Author

image
(Need to watch a mythbuster episode again soon – it was ages ago)

Completely agree. It's definitely a problematic (erroneous) implementation. I remember writing it in connection with switching to Arq from Celery a few months ago and that it was a quick hack at the time to get things up and running. Intent was definitely to create a global connection pool 🤦

No, it almost certainly must have been what saved us up until the 4.6.0 upgrade and thereby allowed this implementation to linger due to no noticeable issues. Fortunate. Unfortunate in that it prevented detection sooner 😊 But glad we can fix it now. Thanks for the help @frankie567 🙏 I'll look at this tomorrow in combination with the other bug you found – you're on a streak 😍

This was referenced Aug 30, 2023
@github-project-automation github-project-automation bot moved this to Done in Polar Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dependencies Pull requests that update a dependency file
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants