Skip to content

Check self-hosted runners are online#19054

Merged
ydshieh merged 1 commit intomainfrom
add_check
Sep 19, 2022
Merged

Check self-hosted runners are online#19054
ydshieh merged 1 commit intomainfrom
add_check

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Sep 15, 2022

What does this PR do?

#18905 checks if the docker could be launched inside the runners.

However, the runners could be offline due to some unknown reasons, and we are not aware of this problem (job hangs forever) so far.

This PR adds a check for runner being online or offline.

However, it might happen that a runner becomes offline in the middle of a workflow run. This situation is not easy to deal with, and we still need to prevent such situation. Therefore, a new scheduled (per hour) workflow is created to check runner availability.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 15, 2022

The documentation is not available anymore as the PR was closed or merged.

name: Send results to webhook
runs-on: ubuntu-latest
needs: check_runner_status
if: ${{ failure() }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send report only when some runners are not available.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

@ydshieh ydshieh merged commit ba7f217 into main Sep 19, 2022
@ydshieh ydshieh deleted the add_check branch September 19, 2022 10:27
@ydshieh ydshieh restored the add_check branch September 19, 2022 10:34
ydshieh added a commit that referenced this pull request Sep 19, 2022
@ydshieh ydshieh deleted the add_check branch September 19, 2022 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants