Skip to content

Commit e7ad89a

Browse files
authored
Update README.md
1 parent 35e1611 commit e7ad89a

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

ML tips/NLP/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -842,4 +842,15 @@ https://arxiv.org/abs/2210.02969
842842
- distillation done via KL divergence loss on **unlabeled data** (eg pseudolabeling)
843843
- q: would it perform even better if smaller model was finetuned on the labeled data afterwards? would be interesting to check
844844

845+
## MPT-7b
846+
https://www.mosaicml.com/blog/mpt-7b
847+
848+
- 7b param model better than llama or at least competitive
849+
- because trained on 1T tokens like llama
850+
- Uses GPT-NeoX20B tokenizer (slightly better than standard gpt2 tokenizer)
851+
- set vocab size from 50,257 -> 50,432 (multiple of 128) and improved MFU by 4 percentage points
852+
- Uses streaming dataset
853+
- Uses ALIBI over positional encoding (improves stability)
854+
- uses Lion optimizer over AdamW
855+
- more stable update magnitutes AND less optimizer state mem
845856

0 commit comments

Comments
 (0)