Skip to content

Commit

Permalink
Up
Browse files Browse the repository at this point in the history
  • Loading branch information
patrickvonplaten committed Jul 19, 2024
1 parent 5c6766c commit 04d2cc8
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,14 @@ for multi-GPU-single-node training setups, but for smaller models, such as the 7
> For more generic approaches, you can check out some other great projects like
> [torchtune](https://pytorch.org/torchtune/stable/overview.html).

## News

- `mistral-finetune` is now compatible with Mistral Nemo!
- 1. Download the new checkpoints [here](##model-download) and set `model_id_or_path` to the new checkpoint
- 2. Fine-tuning Mistral-Nemo requires currently much more memory due to a larger vocabulary size which spikes the peak memory requirement of the CE loss (we'll soon add an improved CE loss here). For now set `seq_len` to 16384 or 8192
- 3. It is recommended to use the same hyperparameters as for the 7B v3.

## Installation

To get started with Mistral LoRA fine-tuning, follow these steps:
Expand Down Expand Up @@ -46,6 +54,8 @@ We recommend fine-tuning one of the official Mistral models which you can downlo
| 8x7B Instruct V1 | [8x7B Instruct](https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar) | `8e2d3930145dc43d3084396f49d38a3f` |
| 8x22 Instruct V3 | [8x22 Instruct](https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-Instruct-v0.3.tar) | `471a02a6902706a2f1e44a693813855b`|
| 8x22B Base V3 | [8x22B Base](https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-v0.3.tar) | `a2fa75117174f87d1197e3a4eb50371a`|
| 12B Instruct | [Mistral-Nemo Instruct](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar) | `296fbdf911cb88e6f0be74cd04827fe7` |
| 12B Base | [Mistral-Nemo Base](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar) | `c5d079ac4b55fc1ae35f51f0a3c0eb83` |

**Important Notice**: For 8x7B Base V1 and 8x7B Instruct V1, it is necessary to use our v3 tokenizer and extend the vocabulary size to 32768 prior to fine-tuning. For detailed instructions on this process, please refer to the ["Model extension"](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file#model-extension) section.

Expand Down

0 comments on commit 04d2cc8

Please sign in to comment.