-
Notifications
You must be signed in to change notification settings - Fork 814
Retry peeruserimport task on Database or connection errors #13821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Retry peeruserimport task on Database or connection errors #13821
Conversation
4e2fbc9
to
236f654
Compare
Build Artifacts
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can maintain the current separation of concerns, and it may be worth the effort of adding a new column to track the retries rather than keeping it in the extra_metadata.
To allow us to migrate the SQLAlchemy table, adding alembic as a dependency feels a bit heavy duty. So perhaps the answer is to clear the jobs table of any finished tasks, then dump the remainder to a temporary CSV, clear the table, recreate, and then reload the data?
permission_classes=None, | ||
long_running=False, | ||
status_fn=None, | ||
retry_on=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job avoiding a classic Python gotcha! (passing mutable values as default arguments, such as []
is a very common mistake that can cause issues)
total_progress=0, | ||
result=None, | ||
long_running=False, | ||
retry_on=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we don't need to store this in the job object - we're not allowing this to be customized per job, only per task - so I think we can just reference this from the task itself, rather than having to pass it in at job initialization. This also saves us having to coerce the exception classes to import paths.
) | ||
setattr(current_state_tracker, "job", None) | ||
|
||
def should_retry(self, exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather defer all this logic to the reschedule_finished_job_if_needed
method on the storage class, rather than having it in the job class.
|
||
def should_retry(self, exception): | ||
retries = self.extra_metadata.get("retries", 0) + 1 | ||
self.extra_metadata["retries"] = retries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit iffy about using extra_metadata for tracking this - I think if we want to hack the existing schema, 'repeat' is probably a better place for this, but I wonder if instead we should add to the job table schema to add error_retries
so that we can put a sensible default in place for failing tasks so they don't endlessly repeat.
I also think I'd rather have the retry interval defined by the task registration (we could also set a sensible default if retryable exceptions are set).
Summary
retry_on
argument in the@task
decorator to specify a list of potential non-deterministic exceptions that can be retried if the job failed because of them.References
Closes #11836.
Reviewer guidance