-
Notifications
You must be signed in to change notification settings - Fork 0
added concurrency to celery in docker_compose.yml #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ks in parallel, before it was set up as pool=solo this will only allow Celery to run multiple tasks in parallel
Summary of ChangesHello @LuisJG8, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on enhancing the performance and robustness of the Celery worker setup. It improves task parallelism by adjusting the Celery configuration in Docker Compose and refines the handling of GitHub API rate limits within the data fetching logic. Additionally, it includes minor code cleanup by commenting out unused or non-functional code. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to enable concurrency for Celery workers by updating the "docker-compose.yml" file. However, it introduces a critical logic error in "worker.py" where a broken rate-limit check can lead to an infinite loop, causing memory exhaustion and GitHub API rate-limit abuse. Additionally, some suggestions have been provided to improve code style and formatting for better readability and maintainability.
| remaining = remaining_api_calls[0] | ||
|
|
||
| if int(counter) >= int(500): | ||
| if remaining_api_calls == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition if remaining_api_calls == 1: is a critical logic error. remaining_api_calls is a tuple (remaining, limit) from github_instance.rate_limiting, not an integer, so this comparison will always be False. This flaw prevents the loop from breaking, leading to an infinite loop that can cause Out-of-Memory (OOM) crashes, GitHub API rate limit abuse, and redundant processing. The current log message Reached batch size limit of {batch_size} will also be misleading in this context.
if remaining <= 1:
print(f"Reached rate limit threshold. Remaining: {remaining}")
break| command: celery -A worker worker --loglevel=info --concurrency=8 | ||
| env_file: ./config/.env | ||
| deploy: | ||
| deploy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| header = [ | ||
| get_github_data.s(start, batch_size) | ||
| for start in range(0, total, batch_size) | ||
| get_github_data.s(start, batch_size) for start in range(0, total, batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this single-line list comprehension is valid, the previous multi-line version was more readable. For better maintainability and readability, it's often better to keep list comprehensions on multiple lines if they don't fit comfortably on one. I'd suggest reverting to the multi-line format.
get_github_data.s(start, batch_size)
for start in range(0, total, batch_size)| return chord(jobs)() | ||
|
No newline at end of file |
||
| # return chord(jobs)() | ||
| No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added concurrency to celery in docker_compose.yml to run multiple tasks in parallel, before it was set up as pool=solo this will only allow Celery to run multiple tasks in parallel