Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example WDL test for gpu's does not work depending on default executor container #653

Closed
stxue1 opened this issue May 29, 2024 · 0 comments · Fixed by #659
Closed

Example WDL test for gpu's does not work depending on default executor container #653

stxue1 opened this issue May 29, 2024 · 0 comments · Fixed by #659

Comments

@stxue1
Copy link
Collaborator

stxue1 commented May 29, 2024

In the test_gpu_task.wdl example test, it depends on the lspci command from the default executor container as no container is specified.

For MiniWDL and Toil, the default container is Ubuntu 20.04:
https://github.com/chanzuckerberg/miniwdl/blob/a34aa902ec1be2f411db1e38daf51bae4f839f42/WDL/runtime/config_templates/default.cfg#L144-L146

But the base Ubuntu 20.04 doesn't support lspci:

$ docker run -it ubuntu:20.04 bash
root@d3b58db61322:/# lspci
bash: lspci: command not found

I think some container with lspci preinstalled should be specified. Ex:

FROM ubuntu:22.04

RUN apt-get -y update && apt-get -y install pciutils

and

  runtime {
    gpu: true
    container: "quay.io/stxue/ubuntu_lspci:latest"
  }

which allows the test to run:

stxue@mustard:/private/groups/patenlab/toil-dev/wdl-conformance-tests$ srun -c 1 --mem 1G --time=0:10:00 --partition gpu --gres=gpu:1 --pty bash -i
stxue@phoenix-05:/private/groups/patenlab/toil-dev/wdl-conformance-tests$ toil-wdl-runner test_gpu_task.wdl 
/usr/lib/python3/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
  "class": algorithms.Blowfish,
[2024-05-29T13:39:17-0700] [MainThread] [I] [toil] Running Toil version 7.1.0a1-e9d0b62d7d62af9672946858fdd6fb587b1dd563 on host phoenix-05.prism.
/usr/lib/python3/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
  "class": algorithms.Blowfish,
[2024-05-29T13:39:18-0700] [MainThread] [I] [toil.leader] 0 jobs are running, 0 jobs are issued and waiting to run
[2024-05-29T13:39:18-0700] [MainThread] [I] [toil.leader] Issued job 'WDLTaskJob' wf.test_gpu.command kind-WDLTaskJob/instance-52hhmugy v1 with job batch system ID: 2 and disk: 2.0 Gi, memor
y: 2.0 Gi, cores: 1, accelerators: [{'count': 1, 'kind': 'gpu'}], preemptible: False
/usr/lib/python3/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
  "class": algorithms.Blowfish,
/usr/lib/python3/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
  "class": algorithms.Blowfish,
[2024-05-29T13:39:25-0700] [MainThread] [I] [toil.leader] Finished toil run successfully.

Workflow Progress 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 (0 failures) [00:06<00:00, 0.48 jobs/s]
{"wf.out": true}
[2024-05-29T13:39:25-0700] [MainThread] [I] [toil.common] Successfully deleted the job store: FileJobStore(/data/tmp/tmpxfvwtsc1/tree)
stxue@phoenix-05:/private/groups/patenlab/toil-dev/wdl-conformance-tests$ MINIWDL__SCHEDULER__CONTAINER_BACKEND=singularity miniwdl run test_gpu_task.wdl 
2024-05-29 13:40:34.114 wdl.w:wf workflow start :: name: "wf", source: "test_gpu_task.wdl", line: 2, column: 1, dir: "/private/groups/patenlab/toil-dev/wdl-conformance-tests/20240529_134034_wf"
2024-05-29 13:40:34.118 wdl.w:wf miniwdl :: version: "v1.12.0", uname: "Linux phoenix-05.prism 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64"
2024-05-29 13:40:34.146 wdl.w:wf ready :: job: "call-test_gpu", callee: "test_gpu"
2024-05-29 13:40:34.147 wdl.w:wf.t:call-test_gpu task setup :: name: "test_gpu", source: "test_gpu_task.wdl", line: 9, column: 1, dir: "/private/groups/patenlab/toil-dev/wdl-conformance-tests/20240529_134034_wf/call-test_gpu", thread: 139860603147840
    0:00:00 elapsed    ▬▬▬        tasks finished: 0, ready: 1, running: 0/usr/lib/python3/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated and will be removed in a future release
  "class": algorithms.Blowfish,
2024-05-29 13:40:34.369 wdl.w:wf.t:call-test_gpu Singularity runtime initialized (BETA) :: singularity_version: "singularity-ce version 3.10.3"
2024-05-29 13:40:34.377 wdl.w:wf.t:call-test_gpu singularity run :: pid: 3168507, log: "/private/groups/patenlab/toil-dev/wdl-conformance-tests/20240529_134034_wf/call-test_gpu/singularity.log.txt"
2024-05-29 13:40:34.651 wdl.w:wf.t:call-test_gpu done
2024-05-29 13:40:34.651 wdl.w:wf finish :: job: "call-test_gpu"
2024-05-29 13:40:34.652 wdl.w:wf done
{
  "dir": "/private/groups/patenlab/toil-dev/wdl-conformance-tests/20240529_134034_wf",
  "outputs": {
    "wf.out": true
  }
}
@stxue1 stxue1 changed the title Example WDL test for gpu's does not work depending on default executor engine Example WDL test for gpu's does not work depending on default executor container May 29, 2024
@jdidion jdidion closed this as completed Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants