Skip to content

[CI] Add AWS EC2 dynamic runner support #6471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 52 commits into from
Aug 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
eceed49
Added AWS EC2 dynamic runner support (#2)
apstasen Jul 23, 2022
9efc4e2
Fixed indentation
apstasen Jul 23, 2022
7a4e3ec
Fixed intendation
apstasen Jul 23, 2022
3cd6efb
Remove whitespace
apstasen Jul 23, 2022
23ab5a7
Clarified some messages text
apstasen Jul 23, 2022
8fefd4f
Removed not needed setup line in comment
apstasen Jul 24, 2022
3b828e6
Clarified description
apstasen Jul 24, 2022
3cabe7b
Clarified description
apstasen Jul 24, 2022
677ea19
Fixed typo in description
apstasen Jul 24, 2022
596929a
Fixed --ephemeral option usage (should be in config.sh)
apstasen Jul 24, 2022
565732b
Formatted text for lint
apstasen Jul 24, 2022
c9c40f6
Typo fix in description
apstasen Jul 24, 2022
df24c1a
Revert "Formatted text for lint"
apstasen Jul 24, 2022
15deeae
Restored original formatting not warped by lint
apstasen Jul 24, 2022
1be24bf
Removed not needed part of comment
apstasen Jul 25, 2022
44fbad7
Merge branch 'sycl' into aws
apstasen Aug 4, 2022
8f0c522
Added EOL
apstasen Aug 4, 2022
aa13988
Fixed indent
apstasen Aug 4, 2022
50855fa
Remove trailing spaces
Aug 4, 2022
284fc39
Moved uniq into sycl_resolve_test_matrix.yml and removed max-parallel…
apstasen Aug 4, 2022
f761199
Do not create dummy aws start/stop runners
apstasen Aug 5, 2022
2548f9d
Fixed indent
apstasen Aug 5, 2022
9c883b5
Start all AWS instances on one job to avoid waiting for parallel jobs
apstasen Aug 5, 2022
97cef06
Added extra debug info
apstasen Aug 5, 2022
e1a84f8
Fixed handling empty/undefined runs-on-list
apstasen Aug 6, 2022
7befe33
Added extra debug message
apstasen Aug 6, 2022
29fc79d
Fix nightly testing
apstasen Aug 6, 2022
5777010
Revert "Fix nightly testing"
apstasen Aug 6, 2022
1f4a076
Do not parse aws type JSON anymore and pass it directly
apstasen Aug 6, 2022
d5d187e
Add test AWS usage
apstasen Aug 6, 2022
7b63970
Adjusted AWS action names
apstasen Aug 6, 2022
6cf5781
Do not get Github reg token more than once
apstasen Aug 6, 2022
3688399
Added option to understand label from array (of one element)
apstasen Aug 6, 2022
513a6a3
Unified label handling code
apstasen Aug 6, 2022
bbf4490
Removed problem configs with too generic runner labels
apstasen Aug 6, 2022
65f39d5
Revert "Removed problem configs with too generic runner labels"
apstasen Aug 6, 2022
4be7006
Removed problem configs with too generic runner labels
apstasen Aug 6, 2022
d3cc81f
Resolved conflict and fixed potential issue with temporary AWS CI tes…
apstasen Aug 9, 2022
f0485f1
Merge branch 'sycl' into aws
apstasen Aug 9, 2022
8404e91
Added more logs
apstasen Aug 9, 2022
c14f4db
More logging info
apstasen Aug 9, 2022
7cd213a
More logging
apstasen Aug 9, 2022
7b59505
Need target repo run context
apstasen Aug 9, 2022
fd24b2e
Revert "Add test AWS usage"
apstasen Aug 9, 2022
0260be2
Preparing for merge
apstasen Aug 9, 2022
152d0b7
Skip empty AWS start job
apstasen Aug 9, 2022
025a2d9
Indent fix
apstasen Aug 9, 2022
291bc39
Indent fix
apstasen Aug 9, 2022
20e9fa9
Use target repo env
apstasen Aug 9, 2022
6cd3733
Use exact package versions
apstasen Aug 9, 2022
c0187cc
Improved security
apstasen Aug 9, 2022
e47afb3
Enable target env for PR
apstasen Aug 9, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 51 additions & 2 deletions .github/workflows/sycl_linux_build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@ on:
type: string
required: false
default: ""
lts_aws_matrix:
type: string
required: false
default: ""
lts_cmake_extra_args:
type: string
required: false
Expand Down Expand Up @@ -155,9 +159,31 @@ jobs:
name: sycl_lit_${{ inputs.build_artifact_suffix }}
path: lit.tar.xz

llvm_test_suite:
aws-start:
name: Start AWS
needs: build
if: ${{ inputs.lts_matrix != '' }}
if: ${{ inputs.lts_aws_matrix != '' }}
runs-on: ubuntu-latest
environment: aws
steps:
- name: Setup script
run: |
mkdir -p ./aws-ec2
wget raw.githubusercontent.com/intel/llvm/sycl/devops/actions/aws-ec2/action.yml -P ./aws-ec2
wget raw.githubusercontent.com/intel/llvm/sycl/devops/actions/aws-ec2/aws-ec2.js -P ./aws-ec2
wget raw.githubusercontent.com/intel/llvm/sycl/devops/actions/aws-ec2/package.json -P ./aws-ec2
npm install ./aws-ec2
- name: Start AWS EC2 runners
uses: ./aws-ec2
with:
runs-on-list: ${{ inputs.lts_aws_matrix }}
GH_PERSONAL_ACCESS_TOKEN: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY }}
AWS_SECRET_KEY: ${{ secrets.AWS_SECRET_KEY }}

llvm_test_suite:
needs: [build, aws-start]
if: ${{ !failure() && inputs.lts_matrix != '' }}
strategy:
fail-fast: false
max-parallel: ${{ inputs.max_parallel }}
Expand Down Expand Up @@ -203,3 +229,26 @@ jobs:
check_sycl_all: ${{ matrix.check_sycl_all }}
results_name_suffix: ${{ matrix.config }}_${{ inputs.build_artifact_suffix }}
cmake_args: '${{ matrix.cmake_args }} ${{ inputs.lts_cmake_extra_args }}'

aws-stop:
name: Stop AWS
needs: [ aws-start, llvm_test_suite ]
if: ${{ always() && inputs.lts_ats_matrix != '' }}
runs-on: ubuntu-latest
environment: aws
steps:
- name: Setup script
run: |
mkdir -p ./aws-ec2
wget raw.githubusercontent.com/intel/llvm/sycl/devops/actions/aws-ec2/action.yml -P ./aws-ec2
wget raw.githubusercontent.com/intel/llvm/sycl/devops/actions/aws-ec2/aws-ec2.js -P ./aws-ec2
wget raw.githubusercontent.com/intel/llvm/sycl/devops/actions/aws-ec2/package.json -P ./aws-ec2
npm install ./aws-ec2
- name: Stop AWS EC2 runners
uses: ./aws-ec2
with:
runs-on-list: ${{ inputs.lts_aws_matrix }}
mode: stop
GH_PERSONAL_ACCESS_TOKEN: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
AWS_ACCESS_KEY: ${{ secrets.AWS_ACCESS_KEY }}
AWS_SECRET_KEY: ${{ secrets.AWS_SECRET_KEY }}
2 changes: 2 additions & 0 deletions .github/workflows/sycl_nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
if: github.repository == 'intel/llvm'
uses: ./.github/workflows/sycl_linux_build_and_test.yml
needs: resolve_matrix
secrets: inherit
with:
build_cache_root: "/__w/"
build_artifact_suffix: default
Expand All @@ -29,6 +30,7 @@ jobs:
if: github.repository == 'intel/llvm'
uses: ./.github/workflows/sycl_linux_build_and_test.yml
needs: resolve_matrix
secrets: inherit
with:
build_cache_root: "/__w/"
build_cache_suffix: opaque_pointers
Expand Down
3 changes: 3 additions & 0 deletions .github/workflows/sycl_post_commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,16 @@ jobs:
name: Linux Default
needs: resolve_matrix
uses: ./.github/workflows/sycl_linux_build_and_test.yml
secrets: inherit
with:
build_cache_root: "/__w/llvm"
build_artifact_suffix: "post_commit"
lts_matrix: ${{ needs.resolve_matrix.outputs.lts_matrix }}
lts_aws_matrix: ${{ needs.resolve_matrix.outputs.lts_aws_matrix }}
linux_no_assert:
name: Linux (no assert)
uses: ./.github/workflows/sycl_linux_build_and_test.yml
secrets: inherit
with:
build_cache_root: "/__w/llvm"
build_cache_suffix: gcc_no_assertions
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/sycl_precommit.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: SYCL

on:
pull_request:
pull_request_target:
branches:
- sycl
# Do not run builds if changes are only in the following locations
Expand All @@ -25,6 +25,7 @@ jobs:
steps:
- uses: actions/checkout@v2
with:
persist-credentials: false
fetch-depth: 2
- name: Run clang-format
uses: ./devops/actions/clang-format
Expand All @@ -43,9 +44,11 @@ jobs:
needs: [lint, resolve_matrix]
if: always() && (success() || contains(github.event.pull_request.labels.*.name, 'ignore-lint'))
uses: ./.github/workflows/sycl_linux_build_and_test.yml
secrets: inherit
with:
build_cache_root: "/__w/"
build_cache_size: "8G"
build_artifact_suffix: "default"
build_cache_suffix: "default"
lts_matrix: ${{ needs.resolve_matrix.outputs.lts_matrix }}
lts_aws_matrix: ${{ needs.resolve_matrix.outputs.lts_aws_matrix }}
8 changes: 8 additions & 0 deletions .github/workflows/sycl_resolve_test_matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,18 @@ on:
type: string
required: true
default: ""
uniq:
description: Unique string to name dynamic runners in AWS
type: string
required: false
default: ${{ github.run_id }}-${{ github.run_attempt }}
outputs:
lts_matrix:
description: "Generated Matrix"
value: ${{ jobs.resolve_matrix.outputs.lts_matrix }}
lts_aws_matrix:
description: "Generated Matrix AWS subset"
value: ${{ jobs.resolve_matrix.outputs.lts_aws_matrix }}
jobs:
resolve_matrix:
name: Resolve Test Matrix
Expand Down
66 changes: 66 additions & 0 deletions devops/actions/aws-ec2/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
name: aws-ec2
description: Start AWS EC2 instances with Github actions runner agent in it

inputs:
runs-on-list:
description: "JSON string with array of objects with aws-type, runs-on, aws-ami, aws-spot, aws-disk, aws-timebomb, one-job properties"
required: true
# aws-type: AWS EC2 instance type. This property must be present if you want to trigger AWS EC2 instance start/stop.
# runs-on: Name of the unique label assigned to the runner used as 'runs-on' property for the following jobs. Mandatory presence required.
# aws-ami: AWS AMI id. Makes sense only for start mode. Default "ami-0966bccbb521ccb24".

# ami-0966bccbb521ccb24: Ubuntu 22.04 (ami-02f3416038bdb17fb with /dev/sda1 disk) with docker installed and gh_runner (1001) like this:
# sudo -s
# apt-get update
# curl -fsSL https://get.docker.com -o /tmp/get-docker.sh
# sh /tmp/get-docker.sh
# groupadd -g 1001 gh_runner; useradd gh_runner -u 1001 -g 1001 -m -s /bin/bash; usermod -aG docker gh_runner; usermod -aG video gh_runner
# sync; shutdown -h now

# ami-02ec0f344128253f9: Amazon Linux 2 AMI with NVIDIA TESLA GPU Driver (ami-06bf0a3f89fe08f0a with /dev/xvda disk) with docker installed and gh_runner (1001) like this:
# sudo -s
# yum update -y
# amazon-linux-extras install docker
# sudo systemctl --now enable docker
# distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# yum-config-manager --disable amzn2-graphics; yum clean expire-cache; yum install -y nvidia-docker2; systemctl restart docker
# groupadd -g 1001 gh_runner; useradd gh_runner -u 1001 -g 1001 -m -s /bin/bash; usermod -aG docker gh_runner; usermod -aG video gh_runner
# sync; shutdown -h now

# ami-0ccda708841dde988: Amazon Linux 2 AMI with AMD Radeon Pro Driver (ami-0bb1072e787242eb6 with /dev/xvda disk) with docker installed and gh_runner (1001) like this:
# sudo -s
# amazon-linux-extras install docker
# sudo systemctl --now enable docker
# groupadd -g 1001 gh_runner; useradd gh_runner -u 1001 -g 1001 -m -s /bin/bash; usermod -aG docker gh_runner; usermod -aG video gh_runner
# sync; shutdown -h now

# aws-spot: Enable usage of spot instances to save money (less reliable). Makes sense only for start mode. Default true.
# aws-disk: AWS EC2 instance AMI specific disk device path and size in GB (8 by default). Makes sense only for start mode. Default "/dev/sda1:16".
# aws-timebomp: AWS EC2 instance maximum live time. Makes sense only for start mode. Default "1h".
# one-job: Will terminate AWS EC2 instance after one job (not waiting for stop job) saving money. Makes sense only for start mode. Default true.

mode:
description: "Mode of operation: start or stop"
required: false
default: start

GH_PERSONAL_ACCESS_TOKEN:
description: "Github personal access token with repo permission"
required: true

AWS_ACCESS_KEY:
description: "AWS access id"
required: true

AWS_SECRET_KEY:
description: "AWS access secret key"
required: true

aws-region:
description: "AWS EC2 region"
required: false
default: "us-east-2" # Ohio

runs:
using: node12
main: ./aws-ec2.js
Loading