Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
shimmyshimmer authored Feb 27, 2025
1 parent 7feb2ff commit 034af99
Showing 1 changed file with 25 additions and 32 deletions.
57 changes: 25 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,8 @@ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and
| ------------------------------- | --------------------------------------- |
| 📚 **Documentation & Wiki** | [Read Our Docs](https://docs.unsloth.ai) |
| <img height="14" src="https://upload.wikimedia.org/wikipedia/commons/6/6f/Logo_of_Twitter.svg" />&nbsp; **Twitter (aka X)** | [Follow us on X](https://twitter.com/unslothai)|
| 💾 **Installation** | [unsloth/README.md](https://github.com/unslothai/unsloth/tree/main#-installation-instructions)|
| 🥇 **Benchmarking** | [Performance Tables](https://github.com/unslothai/unsloth/tree/main#-performance-benchmarking)
| 🌐 **Released Models** | [Unsloth Releases](https://docs.unsloth.ai/get-started/all-our-models)|
| 💾 **Installation** | [Pip install](https://github.com/unslothai/unsloth/edit/main/README.md#-install-unsloth)|
| 🔮 **Our Models** | [Unsloth Releases](https://docs.unsloth.ai/get-started/all-our-models)|
| ✍️ **Blog** | [Read our Blogs](https://unsloth.ai/blog)|
| <img height="14" src="https://redditinc.com/hs-fs/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" />&nbsp; **Reddit** | [Join our Reddit page](https://reddit.com/r/unsloth)|

Expand All @@ -77,36 +76,15 @@ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and
- No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) [Check your GPU!](https://developer.nvidia.com/cuda-gpus) GTX 1070, 1080 works, but is slow.
- Works on **Linux** and **Windows** via WSL.
- Supports 4bit and 16bit QLoRA / LoRA finetuning via [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
- Open source trains 5x faster - see [Unsloth Pro](https://unsloth.ai/) for up to **30x faster training**!
- If you trained a model with 🦥Unsloth, you can use this cool sticker! &nbsp; <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" height="50" align="center" />

## 💾 Install Unsloth

## 🥇 Performance Benchmarking
- For our most detailed benchmarks, read our [Llama 3.3 Blog](https://unsloth.ai/blog/llama3-3).
- Benchmarking of Unsloth was also conducted by [🤗Hugging Face](https://huggingface.co/blog/unsloth-trl).

We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):

| Model | VRAM | 🦥 Unsloth speed | 🦥 VRAM reduction | 🦥 Longer context | 😊 Hugging Face + FA2 |
|----------------|-------|-----------------|----------------|----------------|--------------------|
| Llama 3.3 (70B)| 80GB | 2x | >75% | 13x longer | 1x |
| Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |

<br>

![](https://i.ibb.co/sJ7RhGG/image-41.png)

## 💾 Installation Instructions

Simply use pip install on Linux machines. Windows instructions are below.
<div id="user-content-toc">
<ul align="center" style="list-style: none;">
<summary>
<h1><code>pip install unsloth</code></h1>
</summary>
</ul>
</div>

- **Install with pip (recommended)** for Linux devices:
```
pip install unsloth
```
See below for Windows install instructions:
### Conda Installation (Optional)
`⚠️Only use Conda if you have it. If not, use Pip`. Select either `pytorch-cuda=11.8,12.1` for CUDA 11.8 or CUDA 12.1. We support `python=3.10,3.11,3.12`.
```bash
Expand Down Expand Up @@ -298,11 +276,14 @@ trainer.train()
```

<a name="DPO"></a>
## DPO Support
## DPO + GRPO Support
DPO (Direct Preference Optimization), PPO, Reward Modelling all seem to work as per 3rd party independent testing from [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory). We have a preliminary Google Colab notebook for reproducing Zephyr on Tesla T4 here: [notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing).

We're in 🤗Hugging Face's official docs! We're on the [SFT docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth) and the [DPO docs](https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth)!

<details>
<summary>Click for DPO code</summary>

```python
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Optional set GPU device ID
Expand Down Expand Up @@ -360,9 +341,21 @@ dpo_trainer = DPOTrainer(
)
dpo_trainer.train()
```
</details>

## 🥇 Performance Benchmarking
- For our most detailed benchmarks, read our [Llama 3.3 Blog](https://unsloth.ai/blog/llama3-3).
- Benchmarking of Unsloth was also conducted by [🤗Hugging Face](https://huggingface.co/blog/unsloth-trl).

We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):

| Model | VRAM | 🦥 Unsloth speed | 🦥 VRAM reduction | 🦥 Longer context | 😊 Hugging Face + FA2 |
|----------------|-------|-----------------|----------------|----------------|--------------------|
| Llama 3.3 (70B)| 80GB | 2x | >75% | 13x longer | 1x |
| Llama 3.1 (8B) | 80GB | 2x | >70% | 12x longer | 1x |

## 🥇 Detailed Benchmarking Tables
### Context length benchmarks

#### Llama 3.1 (8B) max. context length
We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.
| GPU VRAM | 🦥Unsloth context length | Hugging Face + FA2 |
Expand Down

0 comments on commit 034af99

Please sign in to comment.