Up

mistralai · Jul 19, 2024 · 04d2cc8 · 04d2cc8
1 parent 5c6766c
commit 04d2cc8
Showing 1 changed file with 10 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -19,6 +19,14 @@ for multi-GPU-single-node training setups, but for smaller models, such as the 7
 > For more generic approaches, you can check out some other great projects like 
 > [torchtune](https://pytorch.org/torchtune/stable/overview.html).
 
+
+## News
+
+- `mistral-finetune` is now compatible with Mistral Nemo! 
+  - 1. Download the new checkpoints [here](##model-download) and set `model_id_or_path` to the new checkpoint
+  - 2. Fine-tuning Mistral-Nemo requires currently much more memory due to a larger vocabulary size which spikes the peak memory requirement of the CE loss (we'll soon add an improved CE loss here). For now set `seq_len` to 16384 or 8192
+  - 3. It is recommended to use the same hyperparameters as for the 7B v3.
+
 ## Installation
 
 To get started with Mistral LoRA fine-tuning, follow these steps:
@@ -46,6 +54,8 @@ We recommend fine-tuning one of the official Mistral models which you can downlo
 | 8x7B Instruct V1 | [8x7B Instruct](https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar) | `8e2d3930145dc43d3084396f49d38a3f` |
 | 8x22 Instruct V3 | [8x22 Instruct](https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-Instruct-v0.3.tar)        | `471a02a6902706a2f1e44a693813855b`|
 | 8x22B Base V3  | [8x22B Base](https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-v0.3.tar)                        | `a2fa75117174f87d1197e3a4eb50371a`|
+| 12B Instruct | [Mistral-Nemo Instruct](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar) | `296fbdf911cb88e6f0be74cd04827fe7` |
+| 12B Base | [Mistral-Nemo Base](https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-base-2407.tar) | `c5d079ac4b55fc1ae35f51f0a3c0eb83` |
 
 **Important Notice**: For 8x7B Base V1 and 8x7B Instruct V1, it is necessary to use our v3 tokenizer and extend the vocabulary size to 32768 prior to fine-tuning. For detailed instructions on this process, please refer to the ["Model extension"](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file#model-extension) section.