-
Notifications
You must be signed in to change notification settings - Fork 341
Closed
Labels
cloud-awsAmazon Web ServicesAmazon Web Servicescml-runnerSubcommandSubcommandp0-criticalMax priority (ASAP)Max priority (ASAP)
Description
I've had a couple of instances recently that have failed to terminate. In the most recent case this was with the --reuse
flag set, having run a series of 8 queued jobs.
The instance is sitting idle, with a timeout of 60s
having passed ten minutes ago. I'll need to terminate the instance manually from the command line.
In the most serious case, I had an instance run for two weeks without terminating. It took so long for us to notice because the instance name did not get set to cml-*
as usual.
Here's the yml we are using:
name: train and evaluate rasa model
on:
pull_request:
types: [opened, synchronize]
workflow_dispatch:
jobs:
deploy-runner:
runs-on: [ubuntu-latest]
steps:
- uses: actions/checkout@v2
- uses: iterative/setup-cml@v1
- name: deploy
shell: bash
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
run: |
cml-runner \
--cloud aws \
--cloud-region eu-west \
--cloud-type=c5a.4xlarge \
--cloud-spot true \
--labels=cml-runner,voice-control,oms-rasa-2 \
--idle-timeout 60 \
--reuse
model-training:
needs: deploy-runner
runs-on: [self-hosted,cml-runner]
container: docker://dvcorg/cml:0-dvc2-base1
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.sha }}
- uses: actions/setup-python@v2
with:
python-version: '3.8.5'
- name: Install dependencies
run: |
apt-get update -y
apt-get install make python3-pip virtualenv curl
- name: cml
env:
REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: eu-west-1
run: |
python --version
make virtualenv
dvc repro
echo "## Metrics" > report.md
git fetch --prune
dvc metrics diff main --show-md | grep "Change\|\-\-\-" >> report.md
dvc metrics diff main --show-md | grep -E "(intent|entity|action).*weighted" | sort >> report.md
sed "s/results\///g" -i report.md
cml-send-comment report.md
dvc push
- uses: actions/upload-artifact@v2
with:
name: gh-artifact-${{ github.event.pull_request.head.sha }}
path: |
report.md
results
retention-days: 30
- uses: EndBug/add-and-commit@v7
if: ${{ github.ref != 'refs/heads/main' }} && ${{ github.ref != 'refs/heads/rasax/prod' }}
with:
add: 'dvc.lock --force'
pull_strategy: 'NO-PULL'
message: 'chg: dvc repro'
Metadata
Metadata
Assignees
Labels
cloud-awsAmazon Web ServicesAmazon Web Servicescml-runnerSubcommandSubcommandp0-criticalMax priority (ASAP)Max priority (ASAP)