Best practices & guides on how to write distributed pytorch training code
-
Updated
Dec 16, 2024 - Python
Best practices & guides on how to write distributed pytorch training code
META LLAMA3 GENAI Real World UseCases End To End Implementation Guide
🦾💻🌐 distributed training & serverless inference at scale on RunPod
Fast and easy distributed model training examples.
A script for training the ConvNextV2 on CIFAR10 dataset using the FSDP technique for a distributed training scheme.
Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP
Dataloading for JAX
Framework, Model & Kernel Optimizations for Distributed Deep Learning - Data Hack Summit
A foundational repository for setting up distributed training jobs using Kubeflow and PyTorch FSDP.
Fully Sharded Data Parallel (FSDP) implementation of Transformer XL
Comprehensive exploration of LLMs, including cutting-edge techniques and tools such as parameter-efficient fine-tuning (PEFT), quantization, zero redundancy optimizers (ZeRO), fully sharded data parallelism (FSDP), DeepSpeed, and Huggingface accelerate.
Add a description, image, and links to the fsdp topic page so that developers can more easily learn about it.
To associate your repository with the fsdp topic, visit your repo's landing page and select "manage topics."