A GPipe implementation in PyTorch
-
Updated
Jul 25, 2024 - Python
A GPipe implementation in PyTorch
An I/O benchmark for deep Learning applications
Extending DOLFINx with checkpointing functionality
Keras wrapper that autosaves what ModelCheckpoint cannot.
A python package for checkpointing, saving, and loading objects.
A python package for performing memory intensive computations in parallel using chunks and checkpointing.
Code and tutorial on integrating wandb sweeps with Slurm pre-emption
Robust distributed checkpointing and job management system for multi-GPU SLURM workloads
A digital album face recognition manager, that isolates images of a specified person from a digital album.
Currently exploring Generative AI to deepen my understanding and skills within web development. Focused on learning how to integrate GenAI into real-world applications and solve practical problems through intelligent automation.
Automatic checkpointing and job resubmission system for robust LLM training on Slurm-based HPC clusters. Collaboration with @vulus98
Add a description, image, and links to the checkpointing topic page so that developers can more easily learn about it.
To associate your repository with the checkpointing topic, visit your repo's landing page and select "manage topics."