[Examples] Add docker compose example to run multiple containers #2745

romilbhardwaj · 2023-10-31T15:47:15Z

Simple example showing how to use docker compose to launch multiple containers on a SkyPilot cluster.

sky launch -c myclus compose_example.yaml

concretevitamin · 2023-11-02T20:55:39Z

This is great to have @romilbhardwaj!

One issue with running it on Azure:

» sky launch compose_example.yaml -c dbg --cloud azure --cpus 2+ --down --gpus T4:4
...
(task, pid=26616)  gpu-app1 Pulled
(task, pid=26616)  gpu-app2 Pulled
(task, pid=26616)  Network sky_workdir_default  Creating
(task, pid=26616)  Network sky_workdir_default  Created
(task, pid=26616)  Container sky_workdir-gpu-app2-1  Creating
(task, pid=26616)  Container sky_workdir-gpu-app1-1  Creating
(task, pid=26616)  Container sky_workdir-gpu-app2-1  Created
(task, pid=26616)  Container sky_workdir-gpu-app1-1  Created
(task, pid=26616) Attaching to sky_workdir-gpu-app1-1, sky_workdir-gpu-app2-1
(task, pid=26616) Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown
ERROR: Job 1 failed with return code list: [1]
INFO: Job finished (status: FAILED).
...

This may be related to our default GPU image on Azure having CUDA too outdated. Is it a quick fix?

romilbhardwaj · 2023-11-02T22:32:54Z

Ah good catch - we do need to update our azure image (#2751). For this PR, I've changed the version to 11.5.2 and tested it works on aws, az and gcp.

concretevitamin · 2023-11-03T00:14:11Z

examples/docker/compose/docker-compose.yml

+services:
+  gpu-app1:
+    image: nvidia/cuda:11.5.2-runtime-ubuntu20.04
+    command: nvidia-smi


I added -l 1 to here and L17 for nvidia-smi to loop forever.

It appears both containers print the same GPU ID. Note the SkyPilot task has 2 GPUs assigned, so GPUs 0 and 1 are available.

Is there any env var (CUDA_VISIBLE_DEVICES?) we can add to this file to show how to distribute the containers to GPUs 0 and 1 respectively? Can even be a comment.

Good point - I've changed from count to explicit device_id. Note that nvidia-docker remaps device ids, so from within gpu-app2 container the GPU ID visible will be ['0'] (though it maps to physical device 1). Also added this as a comment.

concretevitamin

LGTM!

examples/docker/compose/docker-compose.yml

romilbhardwaj added 3 commits October 31, 2023 08:45

Add docker compose example

113abdd

newline

c94d876

Add repo setup for AWS

0d80d8b

Change CUDA to 11.5 for azure compat

6f2634c

concretevitamin reviewed Nov 3, 2023

View reviewed changes

Add device_id

ec8a7b9

romilbhardwaj requested a review from concretevitamin November 3, 2023 19:19

concretevitamin approved these changes Nov 3, 2023

View reviewed changes

examples/docker/compose/docker-compose.yml Outdated Show resolved Hide resolved

typo

8c17694

romilbhardwaj merged commit 5a35ab6 into master Nov 6, 2023

romilbhardwaj deleted the examples_docker_compose branch November 6, 2023 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Examples] Add docker compose example to run multiple containers #2745

[Examples] Add docker compose example to run multiple containers #2745

romilbhardwaj commented Oct 31, 2023

concretevitamin commented Nov 2, 2023 •

edited

Loading

romilbhardwaj commented Nov 2, 2023

concretevitamin Nov 3, 2023

romilbhardwaj Nov 3, 2023

concretevitamin left a comment

[Examples] Add docker compose example to run multiple containers #2745

[Examples] Add docker compose example to run multiple containers #2745

Conversation

romilbhardwaj commented Oct 31, 2023

concretevitamin commented Nov 2, 2023 • edited Loading

romilbhardwaj commented Nov 2, 2023

concretevitamin Nov 3, 2023

Choose a reason for hiding this comment

romilbhardwaj Nov 3, 2023

Choose a reason for hiding this comment

concretevitamin left a comment

Choose a reason for hiding this comment

concretevitamin commented Nov 2, 2023 •

edited

Loading