Gha multi gpu #2

landerlini · 2025-08-29T10:50:29Z

The workflow can be easily parallelized on 4 parallel worker by Snakemake, but natively there is no logic to assign different gpus to different workers.
I have created a small gpu_picker module that keep tracks of which GPUs are allocated and never assign the same gpu to two jobs.
I'll try first to run on a single worker, but enabling GPU allocation via gpu_picker, if that works, I'll plug the 4-gpu runner.

github-actions · 2025-08-29T10:50:38Z

🤖 A new training is being planned.

Name: pp-2016-MU-Sim10b-gha_multi_gpu
Repository sub-dir: pidgan
Snakemake targets: cache_container validate_all
Selected runner: aiinfn-lamarrsim-gpu

At the end of the training, the models will be released and this PR will be notified again.

github-actions · 2025-08-29T12:05:58Z

🚀 Models for pp-2016-MU-Sim10b-gha_multi_gpu were released

You can review the models developed in this PR in Release pp-2016-MU-Sim10b-gha_multi_gpu-2025-08-29T12h05m47

github-actions · 2025-08-29T12:50:00Z

🤖 A new training is being planned.

Name: pp-2016-MU-Sim10b-gha_multi_gpu
Repository sub-dir: pidgan
Snakemake targets: cache_container validate_all
Selected runner: aiinfn-lamarrsim-4gpus

At the end of the training, the models will be released and this PR will be notified again.

github-actions · 2025-08-29T13:34:03Z

🚀 Models for pp-2016-MU-Sim10b-gha_multi_gpu were released

You can review the models developed in this PR in Release pp-2016-MU-Sim10b-gha_multi_gpu-2025-08-29T13h33m54

github-actions · 2025-08-29T15:57:56Z

🤖 A new training is being planned.

Name: pp-2016-MU-Sim10b-gha_multi_gpu
Repository sub-dir: pidgan
Snakemake targets: cache_container validate_all
Selected runner: aiinfn-lamarrsim-4gpus

At the end of the training, the models will be released and this PR will be notified again.

github-actions · 2025-09-01T07:23:48Z

🤖 A new training is being planned.

Name: pp-2016-MU-Sim10b-gha_multi_gpu
Repository sub-dir: pidgan
Snakemake targets: cache_container validate_all
Selected runner: aiinfn-lamarrsim-4gpus

At the end of the training, the models will be released and this PR will be notified again.

github-actions · 2025-09-01T11:28:29Z

🚀 Models for pp-2016-MU-Sim10b-gha_multi_gpu were released

You can review the models developed in this PR in Release pp-2016-MU-Sim10b-gha_multi_gpu-2025-09-01T11h28m17

Lucio Anderlini added 2 commits August 29, 2025 10:43

added gpu_picker to pick or not gpu in each notebook

df2effd

shortened the training for testing purpose

d596800

switching runner and modify profile

c529dc0

moving to a real training

d47c69d

limit concurrent jobs accessing the storage to 2

846554c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gha multi gpu #2

Gha multi gpu #2

Uh oh!

landerlini commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Gha multi gpu #2

Are you sure you want to change the base?

Gha multi gpu #2

Uh oh!

Conversation

landerlini commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

🤖 A new training is being planned.

Uh oh!

github-actions bot commented Aug 29, 2025

🚀 Models for pp-2016-MU-Sim10b-gha_multi_gpu were released

Uh oh!

github-actions bot commented Aug 29, 2025

🤖 A new training is being planned.

Uh oh!

github-actions bot commented Aug 29, 2025

🚀 Models for pp-2016-MU-Sim10b-gha_multi_gpu were released

Uh oh!

github-actions bot commented Aug 29, 2025

🤖 A new training is being planned.

Uh oh!

github-actions bot commented Sep 1, 2025

🤖 A new training is being planned.

Uh oh!

github-actions bot commented Sep 1, 2025

🚀 Models for pp-2016-MU-Sim10b-gha_multi_gpu were released

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants