-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OLMo model family #29890
Add OLMo model family #29890
Conversation
@ArthurZucker and @younesbelkada for general guidance and/or review, since this is a PyTorch text model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Thanks for submitting the PR!
The code seems in a very research state, let's try to get it to transformers
philosophy.
My main query would be: what are the main differences with MixtralMoe
? (Llama + MoE) ? Only include differences for trained checkpoints (I see both alibi and rotary, all of them use both? Or only one was kept?)
The fast tokenizer also looks very similar to GptNeoX / Bloom one, any reason to have a new one?
Thanks for having a look! In a general sense, I tried to stay as faithful to OLMo as I could while following general Regarding tokenization, OLMo adds an eos token to the end of inputs that don't have them. The GptNeoX and Bloom ones don't. Llama's tokenizer does, but it also has a slow tokenizer and I couldn't figure out how to convert the fast OLMo tokenizer to a slow In terms of the model, OLMo doesn't have MoE whereas The official OLMo models use rope and at most 1 of rope and alibi is used in a run (i.e. no alibi), but we hope to release checkpoints from our experiments at some point. Some of those would use alibi. I think it would be nice if transformers could support those checkpoints too. @dirkgr FYI |
Hey! Sorry I might have misunderstood (thought I saw some MoE things here).
|
Just want to clarify my thoughts before I try making changes, so I'm expressing my understanding of what I should do below. Please correct me where I am wrong. Regarding the tokenizer, if possible I should update one of the existing tokenizers and then use that tokenizer directly (similar to how The comments regarding modeling code suggest that I should aim to re-use existing code using Assuming it were possible, the ideal solution would be for me to use some existing model and convert OLMo checkpoints to work with that model. Regarding copying existing code vs being faithful, what should I do when some operation (say, RoPE) exists in transformers but is performed in a different precision (e.g. float32 vs bfloat16) in OLMo compared to the transformer code? Do I use the transformers code and disregard the "small" error, or write separate code that matches the original OLMo behavior? |
Yes! This way it's easier for all the community to use the model, and keeps the diff very clear and clean! For the tokenizer, if
Exactly 💯 For ROPE, we should always compute it in |
initializer_range=0.02, | ||
use_cache=True, | ||
pad_token_id=1, | ||
bos_token_id=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it wanted that default config do not set the bos_token_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OLMo's config class (in the original Github) doesn't have a bos token and so no OLMo checkpoints use bos tokens to my knowledge. Thus setting no bos_token_id
is intentional.
I think the PR is in a good state. Thanks for adding this! pinging @ArthurZucker for a final review :) |
@molbap @ArthurZucker All checks are passing! Now it's just a matter of reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good state!
- camel casing needs to be used on a ll classes!
- the main diff is
clamping
, it should not be an if else, because else it's a llama model 🤗
@@ -174,6 +174,7 @@ | |||
("nllb-moe", "NllbMoeConfig"), | |||
("nougat", "VisionEncoderDecoderConfig"), | |||
("nystromformer", "NystromformerConfig"), | |||
("olmo", "OLMoConfig"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("olmo", "OLMoConfig"), | |
("olmo", "OlmoConfig"), |
we need camel casing on all classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not modify an unrelated file as important as Llama! Remove the copied from and for the forward only! / use # Ignore copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretraining_tp (`int`, *optional*, defaults to 1): | ||
Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this | ||
document](https://huggingface.co/docs/transformers/main/perf_train_gpu_many#tensor-parallelism) to understand more about it. This value is | ||
necessary to ensure exact reproducibility of the pretraining results. Please refer to [this | ||
issue](https://github.com/pytorch/pytorch/issues/76232). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get rid of that no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.config.pretraining_tp > 1: | ||
key_value_slicing = (self.num_key_value_heads * self.head_dim) // self.config.pretraining_tp | ||
query_slices = self.q_proj.weight.split( | ||
(self.num_heads * self.head_dim) // self.config.pretraining_tp, dim=0 | ||
) | ||
key_slices = self.k_proj.weight.split(key_value_slicing, dim=0) | ||
value_slices = self.v_proj.weight.split(key_value_slicing, dim=0) | ||
|
||
query_states = [F.linear(hidden_states, query_slices[i]) for i in range(self.config.pretraining_tp)] | ||
query_states = torch.cat(query_states, dim=-1) | ||
|
||
key_states = [F.linear(hidden_states, key_slices[i]) for i in range(self.config.pretraining_tp)] | ||
key_states = torch.cat(key_states, dim=-1) | ||
|
||
value_states = [F.linear(hidden_states, value_slices[i]) for i in range(self.config.pretraining_tp)] | ||
value_states = torch.cat(value_states, dim=-1) | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretraining TP should be removed. Was mostly unused for Llama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.config.pretraining_tp > 1: | ||
attn_output = attn_output.split(self.hidden_size // self.config.pretraining_tp, dim=2) | ||
o_proj_slices = self.o_proj.weight.split(self.hidden_size // self.config.pretraining_tp, dim=1) | ||
attn_output = sum([F.linear(attn_output[i], o_proj_slices[i]) for i in range(self.config.pretraining_tp)]) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Register a causal mask to separate causal and padding mask creation. Merging happens in the attention class. | ||
# NOTE: This is not friendly with TorchScript, ONNX, ExportedProgram serialization for very large `max_position_embeddings`. | ||
causal_mask = torch.full( | ||
(config.max_position_embeddings, config.max_position_embeddings), fill_value=True, dtype=torch.bool | ||
) | ||
self.register_buffer("causal_mask", torch.triu(causal_mask, diagonal=1), persistent=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove as it is not used anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no new tokenizer, so this should be done in the modeling imo, and just minimal tests to make sure bos and eos is added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed all PR comments, and hopefully all checks will pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your great work! 🔥
* Add OLMo using add-new-model-like with Llama * Fix incorrect tokenizer for OLMo * Copy-paste relevant OLMo methods and their imports * Add OLMo config * Modify OLMo config to follow HF conventions * Remove unneeded Llama code from OLMo model * Add ability for OLMo model to output attentions * Add OLMoPreTrainedModel and OLMoModel * Add OLMoForCausalLM * Minor fixes to OLMo model for style and missing functions * Implement OLMo tokenizer * Implement OLMo to HF conversion script * Add tests for OLMo model * Add tests for OLMo fast tokenizer * Add auto-generated dummy objects * Remove unimplemented OLMo classes from auto and init classes and re-format * Add README and associated auto-generated files * Use OLMo names for common properties * Run make fixup * Remove `|` from OLMo typing * Remove unneeded tokenization_olmo.py * Revert model, config and converter to add-new-model-like Llama * Move logic for adding bos/eos token into GPTNeoxTokenizerFast * Change OLMoConfig defaults to match OLMo-7B * Use GPTNeoXToknizerFast in OLMo tokenizer tests * Modify auto-generated OLMoModelTests to work for OLMo * Add non-parametric layer norm OLMoLayerNorm * Update weight conversion script for OLMo * Fix __init__ and auto structure for OLMo * Fix errors from make fixup * Remove OLMoTokenizerFast from documentation * Add missing 'Copied from' for OLMoModel._update_causal_mask * Run make fix-copies * Rearrange string replacements in OLMoForCausalLM Copied from * Move OLMo and Llama CausalLM.forward example into global constants * Fix OLMO_GENERATION_EXAMPLE doc string typo * Add option for qkv clipping to OLMo * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf * Fix OLMo tokenization bug using conversion script * Keep model in full precision after conversion * Do not add eos token automatically * Update references to OLMo model in HF Hub * Do not add eos token during encoding by default * Fix Llama generation example * Run make fixup * OLMo 7B integration test fix * Remove unneeded special case for OLMoConfig * OLMo 7B Twin 2T integration test fix * Fix test_model_7b_greedy_generation * Remove test_compile_static_cache * Fix OLMo and Llama generation example * Run make fixup * Revert "OLMo 7B integration test fix" This reverts commit 4df56a4. * Revert "OLMo 7B Twin 2T integration test fix" This reverts commit 9ff65a4. * Ungate 7B integration tests and fix greedy generation test * Add retries for flaky test_eager_matches_sdpa_generate * Fix output of doc example for OLMoForCausalLM.forward * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model * Try fix incorrect characters in OLMoForCausalLM.forward doct test * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes * Remove pretraining_tp from OLMo config and model * Add missing 'Copied from' instances * Remove unneeded causal_mask from OLMoModel * Revert Llama changes * Ignore copy for OLMoForCausalLM.forward * Change 'OLMo' to 'Olmo' in classes * Move minimal OLMo tokenization tests to model tests * Add missed 'Copied from' for repeat_kv
* Add OLMo using add-new-model-like with Llama * Fix incorrect tokenizer for OLMo * Copy-paste relevant OLMo methods and their imports * Add OLMo config * Modify OLMo config to follow HF conventions * Remove unneeded Llama code from OLMo model * Add ability for OLMo model to output attentions * Add OLMoPreTrainedModel and OLMoModel * Add OLMoForCausalLM * Minor fixes to OLMo model for style and missing functions * Implement OLMo tokenizer * Implement OLMo to HF conversion script * Add tests for OLMo model * Add tests for OLMo fast tokenizer * Add auto-generated dummy objects * Remove unimplemented OLMo classes from auto and init classes and re-format * Add README and associated auto-generated files * Use OLMo names for common properties * Run make fixup * Remove `|` from OLMo typing * Remove unneeded tokenization_olmo.py * Revert model, config and converter to add-new-model-like Llama * Move logic for adding bos/eos token into GPTNeoxTokenizerFast * Change OLMoConfig defaults to match OLMo-7B * Use GPTNeoXToknizerFast in OLMo tokenizer tests * Modify auto-generated OLMoModelTests to work for OLMo * Add non-parametric layer norm OLMoLayerNorm * Update weight conversion script for OLMo * Fix __init__ and auto structure for OLMo * Fix errors from make fixup * Remove OLMoTokenizerFast from documentation * Add missing 'Copied from' for OLMoModel._update_causal_mask * Run make fix-copies * Rearrange string replacements in OLMoForCausalLM Copied from * Move OLMo and Llama CausalLM.forward example into global constants * Fix OLMO_GENERATION_EXAMPLE doc string typo * Add option for qkv clipping to OLMo * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf * Fix OLMo tokenization bug using conversion script * Keep model in full precision after conversion * Do not add eos token automatically * Update references to OLMo model in HF Hub * Do not add eos token during encoding by default * Fix Llama generation example * Run make fixup * OLMo 7B integration test fix * Remove unneeded special case for OLMoConfig * OLMo 7B Twin 2T integration test fix * Fix test_model_7b_greedy_generation * Remove test_compile_static_cache * Fix OLMo and Llama generation example * Run make fixup * Revert "OLMo 7B integration test fix" This reverts commit 4df56a4. * Revert "OLMo 7B Twin 2T integration test fix" This reverts commit 9ff65a4. * Ungate 7B integration tests and fix greedy generation test * Add retries for flaky test_eager_matches_sdpa_generate * Fix output of doc example for OLMoForCausalLM.forward * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model * Try fix incorrect characters in OLMoForCausalLM.forward doct test * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes * Remove pretraining_tp from OLMo config and model * Add missing 'Copied from' instances * Remove unneeded causal_mask from OLMoModel * Revert Llama changes * Ignore copy for OLMoForCausalLM.forward * Change 'OLMo' to 'Olmo' in classes * Move minimal OLMo tokenization tests to model tests * Add missed 'Copied from' for repeat_kv
* Add OLMo using add-new-model-like with Llama * Fix incorrect tokenizer for OLMo * Copy-paste relevant OLMo methods and their imports * Add OLMo config * Modify OLMo config to follow HF conventions * Remove unneeded Llama code from OLMo model * Add ability for OLMo model to output attentions * Add OLMoPreTrainedModel and OLMoModel * Add OLMoForCausalLM * Minor fixes to OLMo model for style and missing functions * Implement OLMo tokenizer * Implement OLMo to HF conversion script * Add tests for OLMo model * Add tests for OLMo fast tokenizer * Add auto-generated dummy objects * Remove unimplemented OLMo classes from auto and init classes and re-format * Add README and associated auto-generated files * Use OLMo names for common properties * Run make fixup * Remove `|` from OLMo typing * Remove unneeded tokenization_olmo.py * Revert model, config and converter to add-new-model-like Llama * Move logic for adding bos/eos token into GPTNeoxTokenizerFast * Change OLMoConfig defaults to match OLMo-7B * Use GPTNeoXToknizerFast in OLMo tokenizer tests * Modify auto-generated OLMoModelTests to work for OLMo * Add non-parametric layer norm OLMoLayerNorm * Update weight conversion script for OLMo * Fix __init__ and auto structure for OLMo * Fix errors from make fixup * Remove OLMoTokenizerFast from documentation * Add missing 'Copied from' for OLMoModel._update_causal_mask * Run make fix-copies * Rearrange string replacements in OLMoForCausalLM Copied from * Move OLMo and Llama CausalLM.forward example into global constants * Fix OLMO_GENERATION_EXAMPLE doc string typo * Add option for qkv clipping to OLMo * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf * Fix OLMo tokenization bug using conversion script * Keep model in full precision after conversion * Do not add eos token automatically * Update references to OLMo model in HF Hub * Do not add eos token during encoding by default * Fix Llama generation example * Run make fixup * OLMo 7B integration test fix * Remove unneeded special case for OLMoConfig * OLMo 7B Twin 2T integration test fix * Fix test_model_7b_greedy_generation * Remove test_compile_static_cache * Fix OLMo and Llama generation example * Run make fixup * Revert "OLMo 7B integration test fix" This reverts commit 4df56a4. * Revert "OLMo 7B Twin 2T integration test fix" This reverts commit 9ff65a4. * Ungate 7B integration tests and fix greedy generation test * Add retries for flaky test_eager_matches_sdpa_generate * Fix output of doc example for OLMoForCausalLM.forward * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model * Try fix incorrect characters in OLMoForCausalLM.forward doct test * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes * Remove pretraining_tp from OLMo config and model * Add missing 'Copied from' instances * Remove unneeded causal_mask from OLMoModel * Revert Llama changes * Ignore copy for OLMoForCausalLM.forward * Change 'OLMo' to 'Olmo' in classes * Move minimal OLMo tokenization tests to model tests * Add missed 'Copied from' for repeat_kv
* Add OLMo using add-new-model-like with Llama * Fix incorrect tokenizer for OLMo * Copy-paste relevant OLMo methods and their imports * Add OLMo config * Modify OLMo config to follow HF conventions * Remove unneeded Llama code from OLMo model * Add ability for OLMo model to output attentions * Add OLMoPreTrainedModel and OLMoModel * Add OLMoForCausalLM * Minor fixes to OLMo model for style and missing functions * Implement OLMo tokenizer * Implement OLMo to HF conversion script * Add tests for OLMo model * Add tests for OLMo fast tokenizer * Add auto-generated dummy objects * Remove unimplemented OLMo classes from auto and init classes and re-format * Add README and associated auto-generated files * Use OLMo names for common properties * Run make fixup * Remove `|` from OLMo typing * Remove unneeded tokenization_olmo.py * Revert model, config and converter to add-new-model-like Llama * Move logic for adding bos/eos token into GPTNeoxTokenizerFast * Change OLMoConfig defaults to match OLMo-7B * Use GPTNeoXToknizerFast in OLMo tokenizer tests * Modify auto-generated OLMoModelTests to work for OLMo * Add non-parametric layer norm OLMoLayerNorm * Update weight conversion script for OLMo * Fix __init__ and auto structure for OLMo * Fix errors from make fixup * Remove OLMoTokenizerFast from documentation * Add missing 'Copied from' for OLMoModel._update_causal_mask * Run make fix-copies * Rearrange string replacements in OLMoForCausalLM Copied from * Move OLMo and Llama CausalLM.forward example into global constants * Fix OLMO_GENERATION_EXAMPLE doc string typo * Add option for qkv clipping to OLMo * Rearrange OLMoConfig kwargs in convert_olmo_weights_to_hf * Add clip_qkv to OLMoConfig in convert_olmo_weights_to_hf * Fix OLMo tokenization bug using conversion script * Keep model in full precision after conversion * Do not add eos token automatically * Update references to OLMo model in HF Hub * Do not add eos token during encoding by default * Fix Llama generation example * Run make fixup * OLMo 7B integration test fix * Remove unneeded special case for OLMoConfig * OLMo 7B Twin 2T integration test fix * Fix test_model_7b_greedy_generation * Remove test_compile_static_cache * Fix OLMo and Llama generation example * Run make fixup * Revert "OLMo 7B integration test fix" This reverts commit 4df56a4. * Revert "OLMo 7B Twin 2T integration test fix" This reverts commit 9ff65a4. * Ungate 7B integration tests and fix greedy generation test * Add retries for flaky test_eager_matches_sdpa_generate * Fix output of doc example for OLMoForCausalLM.forward * Downsize OLMo doc test for OLMoForCausalLM.forward to 1B model * Try fix incorrect characters in OLMoForCausalLM.forward doct test * Try fix incorrect characters in OLMoForCausalLM.forward doc test using end quotes * Remove pretraining_tp from OLMo config and model * Add missing 'Copied from' instances * Remove unneeded causal_mask from OLMoModel * Revert Llama changes * Ignore copy for OLMoForCausalLM.forward * Change 'OLMo' to 'Olmo' in classes * Move minimal OLMo tokenization tests to model tests * Add missed 'Copied from' for repeat_kv
What does this PR do?
This PR adds the OLMo model family to transformers. A base
OLMoModel
and a casual LMOLMoForCausalLM
are implemented. The models are already present in HF Hub (e.g. allenai/OLMo-7B), and this implementation is compatible with the checkpoints in the Hub.UPDATE: The current version of the PR is not compatible with the old OLMo models in HF Hub. New models have been uploaded to HF Hub (e.g. allenai/OLMo-7B-hf) to support this PR.
Fixes #29885
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.