-
Notifications
You must be signed in to change notification settings - Fork 53
Make sliced models HuggingFace compatible #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
experiments/run_slicegpt.py
Outdated
|
||
import torch | ||
import wandb | ||
from transformers.models.llama.modeling_llama import LlamaConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might be missing one abstraction. We shouldn't need any model specific imports here. hf_utils.get_model_and_tokenizer
abstracts the model type away; we want the same for saving HF-compatible sliced models. This save
abstraction will also avoid the Sliced<Model>ForCausalLM
imports here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is a single intermediate size, we can make this explicit by using the type ConstSlicingScheduler in the interface instead of SlicingScheduler. Otherwise, we can add support for all slicing schedulers and keep the base type SlicingScheduler. I don't mind how this is done - in this PR or a follow up PR. We should update the PR title to reflect the decision and contents of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm :/ Tests are not working due to import issues.
…icrosoft/TransformerCompression into liana/make_model_HF_compatible
I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review. |
Any update by when the changes will be merged |
parallel_blocks=True, | ||
) | ||
|
||
sliced_model = SlicedPhiForCausalLM.from_pretrained( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sliced_model
can be of type SlicedPhiForCausalLM
or SlicedLlamaForCausalLM
at this point. Debugging..
model_adapter.use_cache = False | ||
|
||
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, token=token, local_files_only=local_model) | ||
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, token=token, local_files_only=local_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change breaks loading of local models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, should remain model_path
.
We will work on merging these changes by the end of this week |
Hi Please let me know when these changes will be merged ? |
Hi @pashminacameron and @LianaMikael, Is there a timeline to merge these changes to main? |
This PR adds the implementations for sliced Phi and Llama models to make it easy to save and load sliced models.
The models can be initialized with a given scheduler (or no scheduler for zero sparsity) and support
save_pretrained
andfrom_pretrained
methods like standard HF models.