Skip to content

Commit

Permalink
Integrate AMD GPU in CI/CD environment (huggingface#26007)
Browse files Browse the repository at this point in the history
* Add a Dockerfile for PyTorch + ROCm based on official AMD released artifact

* Add a new artifact single-amdgpu testing on main

* Attempt to test the workflow without merging.

* Changed BERT to check if things are triggered

* Meet the dependencies graph on workflow

* Revert BERT changes

* Add check_runners_amdgpu to correctly mount and check availability

* Rename setup to setup_gpu for CUDA and add setup_amdgpu for AMD

* Fix all the needs.setup -> needs.setup_[gpu|amdgpu] dependencies

* Fix setup dependency graph to use check_runner_amdgpu

* Let's do the runner status check only on AMDGPU target

* Update the Dockerfile.amd to put ourselves in / rather than /var/lib

* Restore the whole setup for CUDA too.

* Let's redisable them

* Change BERT to trigger tests

* Restore BERT

* Add torchaudio with rocm 5.6 to AMD Dockerfile (huggingface#26050)

fix dockerfile

Co-authored-by: Felix Marty <felix@hf.co>

* Place AMD GPU tests in a separate workflow (correct branch) (huggingface#26105)

AMDGPU CI lives in an other workflow

* Fix invalid job name is dependencies.

* Remove tests multi-amdgpu for now.

* Use single-amdgpu

* Use --net=host for now.

* Remote host networking.

* Removed duplicated check_runners_amdgpu step

* Let's tag machine-types with mi210 for now.

* Machine type should be only mi210

* Remove unnecessary push.branches item

* Apply review suggestions moving from `x-amdgpu` to `x-gpu` introducing `amd-gpu` and `miXXX` labels.

* Remove amdgpu from step names.

* finalize

* delete

---------

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
  • Loading branch information
4 people authored Sep 20, 2023
1 parent 37c205e commit 2d71307
Show file tree
Hide file tree
Showing 3 changed files with 415 additions and 19 deletions.
73 changes: 54 additions & 19 deletions .github/workflows/build-docker-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,19 +34,19 @@ jobs:
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
Expand All @@ -59,7 +59,7 @@ jobs:
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
Expand All @@ -83,19 +83,19 @@ jobs:
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
Expand All @@ -120,13 +120,13 @@ jobs:
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
Expand All @@ -136,7 +136,7 @@ jobs:
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
Expand All @@ -152,19 +152,19 @@ jobs:
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-doc-builder
push: true
Expand All @@ -188,26 +188,61 @@ jobs:
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-gpu

latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210]
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu-push-ci

latest-tensorflow:
name: "Latest TensorFlow [dev]"
# Push CI doesn't need this image
Expand All @@ -216,19 +251,19 @@ jobs:
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v2
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-tensorflow-gpu
build-args: |
Expand Down
Loading

0 comments on commit 2d71307

Please sign in to comment.