Integrate Lightning Fabric for scalable multi-GPU training #3

GeorgePearse · 2025-11-21T18:02:10Z

Summary

Integrated PyTorch Lightning Fabric to enable scalable multi-GPU training and simplify device management across the codebase.

Key Changes

Dependencies: Added lightning>=2.0.0 to pyproject.toml.
Refactoring:
- Migrated examples/train.py, examples/train_video.py, examples/train_elic_cifar10.py, and tinify/cli/train.py to use Fabric.
- Removed manual device placement (.to(device)) and CustomDataParallel wrapper.
- Updated training loops to use fabric.backward() and fabric.clip_gradients().
CLI: Added --accelerator, --devices, --strategy, and --precision arguments to training scripts for flexible hardware configuration.

Benefits

Seamless support for multi-GPU training (DDP, FSDP, etc.) without code modifications.
Simplified codebase by removing boilerplate device management code.
Improved mixed-precision training support via Fabric.

Runs pytest against Python 3.9-3.12 matrix on: - Pull requests to main - Pushes to main Excludes slow tests (pretrained model downloads) for fast CI.

…LIC training

- Added tests/test_train_fabric.py - Added tests/test_cli_train.py - Updated tests/test_train.py to use CPU explicitly - Added VideoRateDistortionLoss to tinify/losses for correct video training support in CLI

GeorgePearse added 5 commits November 21, 2025 17:17

Add GitHub Actions workflow to run tests on PRs

27e543d

Runs pytest against Python 3.9-3.12 matrix on: - Pull requests to main - Pushes to main Excludes slow tests (pretrained model downloads) for fast CI.

Simplify CI to Python 3.12 only

8d9ec00

Integrate Lightning Fabric for scalable multi-GPU training

dcb69d5

Fix gradient clipping with mixed precision and add precision arg to E…

64b40a5

…LIC training

Add tests for Fabric integration and CLI training

fdeaa81

- Added tests/test_train_fabric.py - Added tests/test_cli_train.py - Updated tests/test_train.py to use CPU explicitly - Added VideoRateDistortionLoss to tinify/losses for correct video training support in CLI

GeorgePearse merged commit fdeaa81 into main Nov 21, 2025
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate Lightning Fabric for scalable multi-GPU training #3

Integrate Lightning Fabric for scalable multi-GPU training #3

Uh oh!

GeorgePearse commented Nov 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Integrate Lightning Fabric for scalable multi-GPU training #3

Integrate Lightning Fabric for scalable multi-GPU training #3

Uh oh!

Conversation

GeorgePearse commented Nov 21, 2025

Summary

Key Changes

Benefits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants