PyMarlin was developed with the goal of simplifying the E2E Deep Learning experimentation lifecycle for data scientists using PyTorch. The library enables an agile way to quickly prototype a new AI scenario on dev box and seamlessly scale it training multi-node DDP GPU training with AzureML or other cloud services.
- Provides public and enterprise data pre-processing recipes, which provides out of the box vanilla and parallel processing. It requires no additional code to run for AzureML or other environments easily.
- Provides scalable model training with support for Single Process, VM, multi-GPU, multi-node, distributed Data Parallel, mixed-precision (AMP, Apex) training. ORT and DeepSpeed based training are going to be available soon!
- Provides out of the box Plugins that can be used for all typical NLP tasks like Sequence Classification, Named Entity Recognition and Seq2Seq text generation.
- Provides reusable modules for model checkpointing, stats collection, Tensorboard and compliant AML logging which can be customized based on your scenario.
- Provides custom arguments parser that allows for saving all the default values for arguments related to a scenario in an YAML config file, merging user provided arguments at runtime.
- All core modules are thoroughly linted,unit tested and even ran E2E (multi-node, GPU) in AzureML.
- PyMarlin is minimal and has a easy to understand codebase. PyMarlin was designed to make it easy for others to understand the entire codebase and customize according to their needs.
pip install pymarlin
Read the installation doc for more information.
Full website with guides and SDK reference.
Check out the CIFAR image classification example.
Explore how to use pymarlin to benchmark your language models on GLUE tasks.
Reach out to us with your feedback and suggestions.