The Oumi Platform enables the end-to-end development of foundation and specialized models including data curation, data synthesis, pretraining, tuning, and evaluation.
- Run Anywhere: Train and evaluate models seamlessly across local environments, Jupyter notebooks, VS Code debugger, or remote clusters.
- Any Training: Pretraining and comprehensive instruction fine-tuning capabilities, including FFT, LoRA, DPO, and more.
- Scalability: Built-in support for multi-node distributed training using PyTorch's DistributedDataParallel (DDP) or Fully Sharded Data Parallel (FSDP). Inference support for Llama 405B and beyond.
- Cloud Flexibility: Compatible with major cloud providers (GCP, AWS, Azure, ...) and specialized platforms like DOE ALCF Polaris.
- Reproducibility: Flexible configuration system using YAML files and command-line arguments.
- Unified Interface: Streamlined processes for data preprocessing, model training, and evaluation.
- Customizable: Easily extendable to incorporate new models, datasets, and evaluation metrics.
For an overview of the Oumi features and usage, checkout the user guide and the hands-on tour of the repository.
-
(Optional) Set up Git and Conda:
For new developers, we highly recommend that you follow the installation guide to help set up Git and a local conda environment.
-
Install Oumi:
pip install 'oumi[all]'
-
Set up your configuration file (example configs are provided in the configs directory).
-
Run training locally:
oumi-train -c path/to/your/config.yaml
For more advanced training options, see cloud training guide and distributed training.
These configurations demonstrate how to setup and run full training for different model architectures using Oumi.
Model | Type | Configuration | Cluster | Status |
---|---|---|---|---|
Llama Instruction Finetuning | ||||
Llama3.1 8b | LoRA | llama8b_lora.yaml | Polaris | ✨ Supported ✨ |
Llama3.1 8b | SFT | llama8b_sft.yaml | Polaris | ✨ Supported ✨ |
Llama3.1 70b | LoRA | llama70b_lora.yaml | Polaris | ✨ Supported ✨ |
Llama3.1 70b | SFT | llama70b_sft.yaml | Polaris | ✨ Supported ✨ |
Example Models | ||||
Aya | SFT | llama3.8b.aya.sft.yaml | GCP | ✨ Supported ✨ |
Zephyr | QLoRA | zephyr.7b.qlora.yaml | GCP | ✨ Supported ✨ |
ChatQA | SFT | chatqa.stage1.yaml | GCP | ✨ Supported ✨ |
Pre-training | ||||
GPT-2 | Pre-training | gpt2.pt.mac.yaml | Mac (mps) | ✨ Supported ✨ |
Llama2 2b | Pre-training | llama2b.pt.yaml | Polaris | ✨ Supported ✨ |
We provide several Jupyter notebooks to help you get started with Oumi. Here's a list of available examples:
Notebook | Description |
---|---|
A Tour | A comprehensive tour of the Oumi repository and its features |
Finetuning Tutorial | Step-by-step guide on how to finetune models using Oumi |
Tuning Llama | Detailed tutorial on tuning Llama models with Oumi |
Multinode Inference on Polaris | Guides you through running inference with trained models |
Datasets Tutorial | Explains how to work with datasets in Oumi |
Deploying a Job | Instructions on how to deploy a training job using Oumi |
View our API documentation here.
Reach out to matthew@learning-machines.ai if you have problems with access.
Contributions are welcome! After all, this is a community-based effort. Please check the CONTRIBUTING.md
file for guidelines on how to contribute to the project.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
- Pre-commit hook errors with VS Code
-
When committing changes, you may encounter an error with pre-commit hooks related to missing imports.
-
To fix this, make sure to start your vscode instance after activating your conda environment.
conda activate oumi code . # inside the Oumi directory
-