-
Notifications
You must be signed in to change notification settings - Fork 0
From json to pqt #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
From json to pqt #20
Conversation
Summary of ChangesHello @LuisJG8, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the task distribution mechanism within Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new Celery task distribute_tasks to parallelize fetching data from GitHub by splitting it into a group of sub-tasks. This is a good approach for improving performance and idempotency.
However, I've found a critical issue in the implementation that will cause the tasks to fail at runtime. I've also noted a medium-severity issue regarding the use of magic numbers.
Key points for improvement:
- Critical Bug: The
get_github_datatask is called with incorrect arguments, passing an integer where aGithubobject is expected. This needs to be fixed. - Magic Numbers: The range for distributing tasks uses hardcoded numbers, which should be extracted into constants for better readability and maintainability.
- Incomplete Change: The
client.pyfile, which seems to be the entry point for starting the Celery job, has not been updated to call the newdistribute_tasksfunction. It still calls the oldget_github_datatask directly, bypassing the new parallelization logic. - Misleading Title: The PR title 'From json to pqt' does not seem to reflect the changes made, which are focused on refactoring task distribution rather than changing data formats.
Please see my detailed comments in the code.
|
|
||
| logger.info("Worker module loaded") No newline at end of file | ||
| jobs = [ | ||
| get_github_data.s(start, 500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a critical issue in how the get_github_data task is being called. The task's signature is def get_github_data(self, start_in_repo_num: int = 0, github_instance: Github = gh). When you call get_github_data.s(start, 500), Celery passes start as start_in_repo_num and the integer 500 as github_instance. The github_instance parameter expects a Github object, not an integer. This will cause an AttributeError at runtime when the task tries to use this parameter (e.g., github_instance.get_repos(...)).
To fix this, you should remove the second argument. If 500 was intended as a limit, the get_github_data task signature and implementation would need to be updated to handle it.
| get_github_data.s(start, 500) | |
| get_github_data.s(start) |
| logger.info("Worker module loaded") No newline at end of file | ||
| jobs = [ | ||
| get_github_data.s(start, 500) | ||
| for start in range(0, 5000, 500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values 5000 and 500 are used here as magic numbers. For better readability and maintainability, consider defining them as named constants at the module level. For example:
TOTAL_REPOS_TO_PROCESS = 5000
CHUNK_SIZE = 500
# ...
for start in range(0, TOTAL_REPOS_TO_PROCESS, CHUNK_SIZE):This makes the code's intent clearer and easier to modify in the future.
No description provided.