Skip to content

GCP compute instances are not shutdown after idle timeout  #661

Closed

Description

First off, thanks for adding GCP support to cml-runner.

While playing around with it, I noticed that my compute engine instances weren't being shutdown/terminated (i.e., VM powered off) or deleted. I experimented with --idle-timeout and --single, yet neither made a difference. The instance stays alive. This is unexpected given the following sentence from the docs on self-hosted runners:

After the job runs, the instance automatically shuts down.

And indeed, that's what I've observed with AWS instances. Those seem to terminate correctly.

Here's my Github workflow for extra context:

name: 'Train-in-the-cloud-GCP'
on: 
  workflow_dispatch:

jobs:
  deploy-runner:
    runs-on: [ubuntu-latest]
    steps:
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v2
      - name: 'Deploy runner on GCP'
        shell: bash
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          # Notice use of `GOOGLE_APPLICATION_CREDENTIALS_DATA` instead of
          # `GOOGLE_APPLICATION_CREDENTIALS`. Contrary to what docs suggest, the
          # latter causes problems for terraform.
          GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
        run: |
          cml-runner \
          --cloud gcp \
          --cloud-region europe-west1-b	 \
          --cloud-type=n1-standard-1 \
          --labels=cml-runner
          
  model-training:
    needs: deploy-runner
    runs-on: [self-hosted, cml-runner]
    container: docker://dvcorg/cml-py3:latest
    steps:
      - uses: actions/checkout@v2
      - name: 'Train my dummy model'
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          echo "Training a super awesome model"
          sleep 5
          echo "Training complete"

Hope there's a way to ensure the same auto-shutdown behavior on GCP. As it is, the risk of getting smacked with an expensive bill for an idle GPU is just too real =p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions