A GPipe implementation in PyTorch
-
Updated
Jul 25, 2024 - Python
A GPipe implementation in PyTorch
An I/O benchmark for deep Learning applications
Very-Low Overhead Checkpointing System
Extending DOLFINx with checkpointing functionality
Keras wrapper that autosaves what ModelCheckpoint cannot.
A python package for checkpointing, saving, and loading objects.
A python package for performing memory intensive computations in parallel using chunks and checkpointing.
A lightweight checkpointing program written in C.
Code and tutorial on integrating wandb sweeps with Slurm pre-emption
Hangman Game Word Predictor (Character-level attention)
This FLINK project will consume streams from an azure event-hub and produce to a different event-hub ,and the config files for deploying the same in kubernetes
A shared library to help test your code with failure-injection
This is a standalone flink producer using for testing the flink-consume-produce-ek repo contents
DMTCP scripts to get Python scripts working with SLURM.
Robust distributed checkpointing and job management system for multi-GPU SLURM workloads
A digital album face recognition manager, that isolates images of a specified person from a digital album.
Add a description, image, and links to the checkpointing topic page so that developers can more easily learn about it.
To associate your repository with the checkpointing topic, visit your repo's landing page and select "manage topics."