GitHub - ServiceNow/Fast-LLM at 151738fa0f0fb01ee0addcc3c6a1a648b67c9a6b

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
Megatron-LM @ f6b9b4b		Megatron-LM @ f6b9b4b
docs		docs
examples		examples
fast_llm		fast_llm
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yaml		mkdocs.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

Accelerating your LLM training to full speed

Made with ❤️ by ServiceNow Research

Overview

Fast-LLM is a new open-source library for training large language models, built on PyTorch and Triton. It is extremely fast, scales to large clusters, supports a wide range of model architectures, and is easy to use. Unlike commercial frameworks like Megatron-LM, which are largely closed off and fragmented across forks, Fast-LLM is fully open-source and encourages community-driven development. Researchers can freely customize and optimize as needed, making it a flexible and hackable alternative that combines the speed of specialized tools with the openness of libraries like Hugging Face Transformers.

Note

Fast-LLM is not affiliated with Fast.AI, FastHTML, FastAPI, FastText, or other similarly named projects. Our library's name refers to its speed and efficiency in language model training.

Why Fast-LLM?

🚀 Fast-LLM is Blazingly Fast:
- ⚡️ Optimized kernel efficiency and reduced overheads.
- 🔋 Optimized memory usage for best performance.
- ⏳ Minimizes training time and cost.
📈 Fast-LLM is Highly Scalable:
- 📡 Distributed training across multiple GPUs and nodes using 3D parallelism (Data, Tensor, and Pipeline).
- 🔗 Supports sequence length parallelism to handle longer sequences effectively.
- 🧠 ZeRO-1, ZeRO-2, and ZeRO-3 implementations for improved memory efficiency.
- 🎛️ Mixed precision training support for better performance.
- 🏋️‍♂️ Large batch training and gradient accumulation support.
- 🔄 Reproducible training with deterministic behavior.
🎨 Fast-LLM is Incredibly Flexible:
- 🤖 Compatible with all common language model architectures in a unified class.
- ⚡ Efficient dropless Mixture-of-Experts (MoE) implementation with SoTA performance.
- 🧩 Customizable language model architectures, data loaders, loss functions, and optimizers (in progress).
- 🤗 Seamless integration with Hugging Face Transformers.
🎯 Fast-LLM is Super Easy to Use:
- 📦 Pre-built Docker images for quick deployment.
- 📝 Simple YAML configuration for hassle-free setup.
- 💻 Command-line interface for easy launches.
- 📊 Detailed logging and real-time monitoring features.
- 📚 Extensive documentation and practical tutorials (in progress).
🌐 Fast-LLM is Truly Open Source:
- ⚖️ Licensed under Apache 2.0 for maximum freedom to use Fast-LLM at work, in your projects, or for research.
- 💻 Fully developed on GitHub with a public roadmap and transparent issue tracking.
- 🤝 Contributions and collaboration are always welcome!

Usage

We'll walk you through how to use Fast-LLM to train a large language model on a cluster with multiple nodes and GPUs. We'll show an example setup using a Slurm cluster and a Kubernetes cluster.

For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file examples/mistral-4-node-benchmark.yaml is pre-configured for a multi-node setup with 4 DGX nodes, each with 8 A100-80GB or H100-80GB GPUs.

Note

Fast-LLM scales from a single GPU to large clusters. You can start small and expand based on your resources.

Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of 9,800 tokens/s/H100 (batch size 32, sequence length 8k) on a 4-node cluster with 32 H100s.

Running Fast-LLM on a Slurm Cluster

Prerequisites

A Slurm cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
CUDA 12.1 or higher.
Dependencies: PyTorch, Triton, and Apex installed on all nodes.

Steps

Deploy the nvcr.io/nvidia/pytorch:24.07-py3 Docker image to all nodes (recommended), because it contains all the necessary dependencies.

Install Fast-LLM on all nodes:

sbatch <<EOF
#!/bin/bash
#SBATCH --nodes=$(scontrol show node | grep -c NodeName)
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks=$(scontrol show node | grep -c NodeName)
#SBATCH --exclusive

srun bash -c 'pip install --no-cache-dir -e "git+https://github.com/ServiceNow/Fast-LLM.git#egg=llm[CORE,OPTIONAL,DEV]"'
EOF

Use the example Slurm job script examples/fast-llm.sbat to submit the job to the cluster:
```
sbatch examples/fast-llm.sbat
```
Monitor the job's progress:
- Logs: Follow job_output.log and job_error.log in your working directory for logs.
- Status: Use squeue -u $USER to see the job status.

Now, you can sit back and relax while Fast-LLM trains your model at full speed! ☕

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

A Kubernetes cluster with at least 4 DGX nodes with 8 A100-80GB or H100-80GB GPUs each.
KubeFlow installed.
Locked memory limit set to unlimited at the host level on all nodes. Ask your cluster admin to do this if needed.

Steps

Create a Kubernetes PersistentVolumeClaim (PVC) named fast-llm-home that will be mounted to /home/fast-llm in the container using examples/fast-llm-pvc.yaml:
```
kubectl apply -f examples/fast-llm-pvc.yaml
```
Create a PyTorchJob resource using the example configuration file examples/fast-llm.pytorchjob.yaml:
```
kubectl apply -f examples/fast-llm.pytorchjob.yaml
```
Monitor the job status:
- Use kubectl get pytorchjobs to see the job status.
- Use kubectl logs -f fast-llm-master-0 -c pytorch to follow the logs.

That's it! You're now up and running with Fast-LLM on Kubernetes. 🚀

Next Steps

📖 Want to learn more? Check out our documentation for more information on how to use Fast-LLM.

🔨 We welcome contributions to Fast-LLM! Have a look at our contribution guidelines.

🐞 Something doesn't work? Open an issue!

License

Fast-LLM is licensed by ServiceNow, Inc. under the Apache 2.0 License. See LICENSE for more information.

Vulnerability Reporting

For security issues, email disclosure@servicenow.com. See our security policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Why Fast-LLM?

Usage

Running Fast-LLM on a Slurm Cluster

Prerequisites

Steps

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

Steps

Next Steps

License

Vulnerability Reporting

About

Releases

Packages

Contributors 9

Languages

License

ServiceNow/Fast-LLM

Folders and files

Latest commit

History

Repository files navigation

Overview

Why Fast-LLM?

Usage

Running Fast-LLM on a Slurm Cluster

Prerequisites

Steps

Running Fast-LLM on a Kubernetes Cluster

Prerequisites

Steps

Next Steps

License

Vulnerability Reporting

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages