A Brief Introduction to PyTorch DDP Model Training

Slide: https://docs.google.com/presentation/d/1y15fzHoBhjBFpmy44Q3VrW0gJMS7ePhCVsRgXNmqC-g/edit?usp=sharing

Useful Commands

Allocate node: 
- Single Node: salloc --nodes 1 --qos interactive --time 01:00:00 --constraint gpu --gpus-per-node 4 --account=<acc_name>_g
- Multinode: salloc --nodes 2 --qos interactive --time 00:30:00 --ntasks-per-node=4 --cpus-per-task=32 --constraint gpu --gpus-per-node 4 --account=<acc_name>_g
    - Run: ./multinode_setup_and_run.sh [9-11]<filename>.py
Enable PyTorch: module load pytorch/2.6.0
Deep profiling: 
- pip install -U tensorboard torch-tb-profiler
- **Add profile context in the code**
- tensorboard --logdir=./log
Debug Tips:
- Check for slow import: python -X importtime file.py
- When using print, use flush=True. E.g.: print('test', flush=True)

0_torch_dist.py Launch 4 processes and performs all_reduce
1_torch_dist_gpu.py Launch 4 processes and performs all_reduce in GPU
2_mnist_training.py A simple MNIST classification pipeline using a single GPU
3_mnist_distributed.py 4 identical processes performing exactly same task - a simple MNIST classification pipeline GPU
4_mnist_manual_ddp.py A manual implementation of DDP on previous pipeline with distributed sampler
5_mnist_manual_ddp_profile.py Tensorboard profiler on previous code, reduced epoch for smaller profile data. Output saved as ./log_ddp. Inspect using tensorboard --logdir=./log_ddp.
6_mnist_ddp_pt.py PyTorch wrapper for DDP. No manual all_reduce needed
7_mnist_ddp_pt_timing.py PyTorch DDP code with arg = GPU number. Prints out the time taken for model training. arg = 4 should give lower runtime.
8_mnist_ddp_pt_lr.py Adjustment of learning rate so that the loss curve matches single GPU.
test-profile.py A smaller DDP ML pipeline with dummy data and Tensorboard profiling for easier inspection. Output saved as ./log. Inspect using tensorboard --logdir=./log.

Day 2

Useful Commands:

Multinode allocation: salloc --nodes 2 --qos interactive --time 00:30:00 --ntasks-per-node=4 --cpus-per-task=32 --constraint gpu --gpus-per-node 4 --account=<acc_name>_g
module load pytorch/2.6.0
Run: ./multinode_setup_and_run.sh <filename>.py

Files

9_torch_dist_multi_node.py all_reduce example for multinode
10_mnist_ddp_pt_multinode.py MNIST training pipeline for MultiNode DDP
11_mnist_ddp_pt_multinode_streaming.py MNIST training pipeline for MultiNode DDP with streaming dataloader from disk
create_mnist_h5.py Create chunked MNIST dataset for streaming

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Brief Introduction to PyTorch DDP Model Training

Useful Commands

Contents

Day 1

Files

Day 2

Files

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
img		img
.gitignore		.gitignore
0_torch_dist.py		0_torch_dist.py
10_mnist_ddp_pt_multinode.py		10_mnist_ddp_pt_multinode.py
11_mnist_ddp_pt_multinode_streaming.py		11_mnist_ddp_pt_multinode_streaming.py
1_torch_dist_gpu.py		1_torch_dist_gpu.py
2_mnist_training.py		2_mnist_training.py
3_mnist_distributed.py		3_mnist_distributed.py
4_mnist_manual_ddp.py		4_mnist_manual_ddp.py
5_mnist_manual_ddp_profile.py		5_mnist_manual_ddp_profile.py
6_mnist_ddp_pt.py		6_mnist_ddp_pt.py
7_mnist_ddp_pt_timing.py		7_mnist_ddp_pt_timing.py
8_mnist_ddp_pt_lr.py		8_mnist_ddp_pt_lr.py
9_torch_dist_multi_node.py		9_torch_dist_multi_node.py
create_mnist_h5.py		create_mnist_h5.py
mnist_grouped.h5		mnist_grouped.h5
multinode_setup_and_run.sh		multinode_setup_and_run.sh
out.txt		out.txt
readme.md		readme.md
test-profile.py		test-profile.py

HipGraph/DDP_demo

Folders and files

Latest commit

History

Repository files navigation

A Brief Introduction to PyTorch DDP Model Training

Useful Commands

Contents

Day 1

Files

Day 2

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages