Skip to content
/ oumi Public

Everything you need to build state-of-the-art foundation models, end-to-end.

License

Notifications You must be signed in to change notification settings

oumi-ai/oumi

 
 

Repository files navigation

Open Universal Machine Intelligence (Oumi)

PyPI version License Pre-review Tests Documentation Code style: black pre-commit

The Oumi Platform enables the end-to-end development of foundation and specialized models including data curation, data synthesis, pretraining, tuning, and evaluation.

Features

  • Run Anywhere: Train and evaluate models seamlessly across local environments, Jupyter notebooks, VS Code debugger, or remote clusters.
  • Any Training: Pretraining and comprehensive instruction fine-tuning capabilities, including FFT, LoRA, DPO, and more.
  • Scalability: Built-in support for multi-node distributed training using PyTorch's DistributedDataParallel (DDP) or Fully Sharded Data Parallel (FSDP). Inference support for Llama 405B and beyond.
  • Cloud Flexibility: Compatible with major cloud providers (GCP, AWS, Azure, ...) and specialized platforms like DOE ALCF Polaris.
  • Reproducibility: Flexible configuration system using YAML files and command-line arguments.
  • Unified Interface: Streamlined processes for data preprocessing, model training, and evaluation.
  • Customizable: Easily extendable to incorporate new models, datasets, and evaluation metrics.

Getting Started

For an overview of the Oumi features and usage, checkout the user guide and the hands-on tour of the repository.

Quickstart

  1. (Optional) Set up Git and Conda:

    For new developers, we highly recommend that you follow the installation guide to help set up Git and a local conda environment.

  2. Install Oumi:

    pip install 'oumi[all]'
  3. Set up your configuration file (example configs are provided in the configs directory).

  4. Run training locally:

    oumi-train -c path/to/your/config.yaml

    For more advanced training options, see cloud training guide and distributed training.

Configurations

These configurations demonstrate how to setup and run full training for different model architectures using Oumi.

Model Type Configuration Cluster Status
Llama Instruction Finetuning
Llama3.1 8b LoRA llama8b_lora.yaml Polaris ✨ Supported ✨
Llama3.1 8b SFT llama8b_sft.yaml Polaris ✨ Supported ✨
Llama3.1 70b LoRA llama70b_lora.yaml Polaris ✨ Supported ✨
Llama3.1 70b SFT llama70b_sft.yaml Polaris ✨ Supported ✨
Example Models
Aya SFT llama3.8b.aya.sft.yaml GCP ✨ Supported ✨
Zephyr QLoRA zephyr.7b.qlora.yaml GCP ✨ Supported ✨
ChatQA SFT chatqa.stage1.yaml GCP ✨ Supported ✨
Pre-training
GPT-2 Pre-training gpt2.pt.mac.yaml Mac (mps) ✨ Supported ✨
Llama2 2b Pre-training llama2b.pt.yaml Polaris ✨ Supported ✨

Tutorials

We provide several Jupyter notebooks to help you get started with Oumi. Here's a list of available examples:

Notebook Description
A Tour A comprehensive tour of the Oumi repository and its features
Finetuning Tutorial Step-by-step guide on how to finetune models using Oumi
Tuning Llama Detailed tutorial on tuning Llama models with Oumi
Multinode Inference on Polaris Guides you through running inference with trained models
Datasets Tutorial Explains how to work with datasets in Oumi
Deploying a Job Instructions on how to deploy a training job using Oumi

Documentation

View our API documentation here.

Reach out to matthew@learning-machines.ai if you have problems with access.

Contributing

Contributions are welcome! After all, this is a community-based effort. Please check the CONTRIBUTING.md file for guidelines on how to contribute to the project.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Troubleshooting

  1. Pre-commit hook errors with VS Code
    • When committing changes, you may encounter an error with pre-commit hooks related to missing imports.

    • To fix this, make sure to start your vscode instance after activating your conda environment.

      conda activate oumi
      code .  # inside the Oumi directory