Skip to content

Conversation

AlexVelezLl
Copy link
Member

@AlexVelezLl AlexVelezLl commented Oct 8, 2025

Summary

  • Adds support for a retry_on argument in the @task decorator to specify a list of potential non-deterministic exceptions that can be retried if the job failed because of them.
    • The user won't see the task failed until the same task was re-attempted 3 times.
  • Updates the setupwizard frontend to handle failed tasks.
  • Updates the setupwizard frontend to persists users being imported.
  • Disable back button in the import users page when importing users to prevent unexpected page layouts.

References

Closes #11836.

Reviewer guidance

@AlexVelezLl AlexVelezLl requested a review from rtibbles October 8, 2025 23:22
@github-actions github-actions bot added DEV: backend Python, databases, networking, filesystem... APP: Setup Wizard Re: Setup Wizard (facility import, superuser creation, settings, etc.) DEV: frontend SIZE: medium labels Oct 8, 2025
@AlexVelezLl AlexVelezLl force-pushed the fix-lod-import-multi-users branch from 4e2fbc9 to 236f654 Compare October 8, 2025 23:26
Copy link
Contributor

github-actions bot commented Oct 8, 2025

@rtibbles rtibbles self-assigned this Oct 9, 2025
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can maintain the current separation of concerns, and it may be worth the effort of adding a new column to track the retries rather than keeping it in the extra_metadata.

To allow us to migrate the SQLAlchemy table, adding alembic as a dependency feels a bit heavy duty. So perhaps the answer is to clear the jobs table of any finished tasks, then dump the remainder to a temporary CSV, clear the table, recreate, and then reload the data?

permission_classes=None,
long_running=False,
status_fn=None,
retry_on=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job avoiding a classic Python gotcha! (passing mutable values as default arguments, such as [] is a very common mistake that can cause issues)

total_progress=0,
result=None,
long_running=False,
retry_on=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we don't need to store this in the job object - we're not allowing this to be customized per job, only per task - so I think we can just reference this from the task itself, rather than having to pass it in at job initialization. This also saves us having to coerce the exception classes to import paths.

)
setattr(current_state_tracker, "job", None)

def should_retry(self, exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather defer all this logic to the reschedule_finished_job_if_needed method on the storage class, rather than having it in the job class.


def should_retry(self, exception):
retries = self.extra_metadata.get("retries", 0) + 1
self.extra_metadata["retries"] = retries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit iffy about using extra_metadata for tracking this - I think if we want to hack the existing schema, 'repeat' is probably a better place for this, but I wonder if instead we should add to the job table schema to add error_retries so that we can put a sensible default in place for failing tasks so they don't endlessly repeat.

I also think I'd rather have the retry interval defined by the task registration (we could also set a sensible default if retryable exceptions are set).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

APP: Setup Wizard Re: Setup Wizard (facility import, superuser creation, settings, etc.) DEV: backend Python, databases, networking, filesystem... DEV: frontend SIZE: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Setup Wizard - Confusing behavior when importing multiple learners

2 participants