-
Notifications
You must be signed in to change notification settings - Fork 3
Fix gdrive_sync video processing issues and improve multipart upload reliability #2569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ibrahimjaved12, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. This PR aims to address issues related to processing large video uploads, specifically those originating from Google Drive and being uploaded to YouTube. The core changes involve making the background task responsible for processing Drive files more resilient by adding retry logic and refining the task that updates YouTube video statuses to be more efficient and correctly handle video completion.
Highlights
- Retry Logic for Drive File Processing: The
process_drive_file
Celery task ingdrive_sync/tasks.py
has been updated to include retry logic. This means if an error occurs during the processing of a Google Drive file (which includes initiating video transcoding), the task will automatically retry up to 3 times with a 60-second backoff, making the process more robust against transient issues. - Improved YouTube Status Updates: The
update_youtube_statuses
Celery task invideos/tasks.py
has been refined. It now avoids fetching the YouTube status for videos that are already marked asPROCESSED
. Additionally, the logic for marking a video asCOMPLETE
, triggering the transcript job, and sending the success email is now more tightly coupled to the confirmation that the video's YouTube status is indeedPROCESSED
.
Changelog
Click here to see the changelog
- gdrive_sync/tasks.py
- Added
bind=True
,acks_late=True
,retry_backoff=60
, andmax_retries=3
to the@app.task
decorator forprocess_drive_file
(L37). - Updated
process_drive_file
signature to acceptself
(L38). - Changed bare
except
toexcept Exception as exc
(L52). - Implemented retry logic within the exception handler for
process_drive_file
(L54-L57).
- Added
- videos/constants.py
- Minor docstring update for
VideoStatus
class (L19).
- Minor docstring update for
- videos/tasks.py
- Removed
# noqa: C901
fromupdate_youtube_statuses
function signature (L152). - Added a check
if video_file.destination_status != YouTubeStatus.PROCESSED
before fetching the YouTube status to avoid redundant API calls (L168). - Adjusted the logic flow to ensure
video_file.status
is set toCOMPLETE
, the transcript job is added, and the success email is sent only when the YouTube status is confirmed asPROCESSED
(L174-L178, L191-L193).
- Removed
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Task runs in the queue,
Sometimes fails, what to do?
Retry, wait, then try,
Until the work is done, oh my!
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request addresses issues with large video uploads, which is great. The changes include adding retry logic to the process_drive_file
task and updating YouTube video statuses. Overall, the changes seem reasonable, but there are a few areas that could be improved.
Summary of Findings
- Exception Handling: The pull request catches broad exceptions in
gdrive_sync/tasks.py
, which is generally discouraged. It's better to catch specific exceptions or usecelery.exceptions.Retry
. - Task Acknowledgement: The pull request uses
acks_late=True
for theprocess_drive_file
task, which could lead to duplicate executions in certain failure scenarios. Consider the implications of this choice. - Video Status Update: The pull request updates YouTube video statuses, but it might be missing an update to
video_file.status
after updatingdestination_status
.
Merge Readiness
The pull request addresses an important issue and includes some good improvements. However, the exception handling and task acknowledgement aspects should be carefully reviewed before merging. Additionally, the video status update logic could be improved. I am unable to directly approve this pull request, and recommend that these comments be addressed before merging. It is also recommended that others review and approve this code before merging.
except Exception as exc: # pylint: disable=broad-exception-caught | ||
log.exception("Error processing DriveFile %s", drive_file_id) | ||
|
||
return None | ||
try: | ||
raise self.retry(exc=exc) from exc | ||
except self.MaxRetriesExceededError: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching Exception
is generally discouraged because it can mask unexpected errors. It's better to catch specific exceptions that you expect and can handle. In this case, what exceptions are you anticipating? If you want to retry on all exceptions, consider using celery.exceptions.Retry
directly. Also, consider adding a delay before retrying, in case the underlying issue is transient.
@@ -34,8 +34,8 @@ | |||
log = logging.getLogger(__name__) | |||
|
|||
|
|||
@app.task() | |||
def process_drive_file(drive_file_id: str): | |||
@app.task(bind=True, acks_late=True, retry_backoff=60, max_retries=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding acks_late=True
is good for ensuring the task is executed even if the worker crashes, but it's important to understand the implications. Since the task is acknowledged after it's completed, there's a possibility of duplicate executions if the worker crashes after processing but before sending the acknowledgement. Is this acceptable for this task?
if video_file.destination_status != YouTubeStatus.PROCESSED: | ||
video_file.destination_status = youtube.video_status( | ||
video_file.destination_id | ||
) | ||
video_file.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibrahimjaved12 please write a more descriptive PR title and commit message. What are you fixing? |
@pdpinch Updated the description and PR title |
gdrive_sync/api.py
Outdated
logging.basicConfig(level=logging.INFO) | ||
logging.getLogger("boto3").setLevel(logging.DEBUG) | ||
logging.getLogger("botocore").setLevel(logging.DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for debugging - Will be removed soon
LGTM so far (did not run locally though). One potential way to replicate part of the problem on the main branch might be to add a long delay in the @retry_on_failure
def stream_to_s3(drive_file: DriveFile):
"""Stream a Google Drive file to S3"""
time.sleep(300) # 5 minute delay
try:
// etc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested the upload with two ~1.7 GB files simultaneously. Both files uploaded with timeout errors in a few parts followed by a retry.
However, there is a nitpick that the video with a failed parts, although uploaded completely on the next try, still has a task running in the celery that is trying to upload it again. This status is a bit misleading and may result in a duplicate transcoding job or unexpected behavior.
What are the relevant tickets?
https://github.com/mitodl/hq/issues/7086
Description (What does it do?)
There were two issues reported for different courses where
gdrive_sync
process for videos failed/stalled/worked partially. Some of the videos in both cases were also very large in size (~6gb+).These are the tickets:
https://github.com/mitodl/hq/issues/7114
https://github.com/mitodl/hq/issues/6818
The error in them as returned by Celery was
drive_file.resource
returnedNone
inupdate_youtube_statuses
.In one course, WebsiteContent objects were not made and in another, WebsiteContent objects were not updated to add
youtube_id
.Another thing reported by Celery was there were 4 unfinished
process_drive_file
tasks, started by 1 unfinishedimport_website_files
task.This PR attempts to fix these problems by implementing different things:
process_drive_file
TransferConfig
with custom configuration (64 mb chunks) for better multiparting in stream_to_s3 function for large video uploads.YouTubeAPI
only when neededHow can this be tested?
Make sure the video process works as expected. Try uploading large videos (1gb+) to gdrive and sync. Make sure all the relevant objects are made (
WebsiteContent
,Video
,DriveFile
).