Skip to content

GCP compute instances are not shutdown after idle timeout  #661

Closed
@a3lem

Description

@a3lem

First off, thanks for adding GCP support to cml-runner.

While playing around with it, I noticed that my compute engine instances weren't being shutdown/terminated (i.e., VM powered off) or deleted. I experimented with --idle-timeout and --single, yet neither made a difference. The instance stays alive. This is unexpected given the following sentence from the docs on self-hosted runners:

After the job runs, the instance automatically shuts down.

And indeed, that's what I've observed with AWS instances. Those seem to terminate correctly.

Here's my Github workflow for extra context:

name: 'Train-in-the-cloud-GCP'
on: 
  workflow_dispatch:

jobs:
  deploy-runner:
    runs-on: [ubuntu-latest]
    steps:
      - uses: iterative/setup-cml@v1
      - uses: actions/checkout@v2
      - name: 'Deploy runner on GCP'
        shell: bash
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
          # Notice use of `GOOGLE_APPLICATION_CREDENTIALS_DATA` instead of
          # `GOOGLE_APPLICATION_CREDENTIALS`. Contrary to what docs suggest, the
          # latter causes problems for terraform.
          GOOGLE_APPLICATION_CREDENTIALS_DATA: ${{ secrets.GOOGLE_APPLICATION_CREDENTIALS }}
        run: |
          cml-runner \
          --cloud gcp \
          --cloud-region europe-west1-b	 \
          --cloud-type=n1-standard-1 \
          --labels=cml-runner
          
  model-training:
    needs: deploy-runner
    runs-on: [self-hosted, cml-runner]
    container: docker://dvcorg/cml-py3:latest
    steps:
      - uses: actions/checkout@v2
      - name: 'Train my dummy model'
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          echo "Training a super awesome model"
          sleep 5
          echo "Training complete"

Hope there's a way to ensure the same auto-shutdown behavior on GCP. As it is, the risk of getting smacked with an expensive bill for an idle GPU is just too real =p

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions