Skip to content

Conversation

LianaMikael
Copy link
Contributor

This PR adds the implementations for sliced Phi and Llama models to make it easy to save and load sliced models.
The models can be initialized with a given scheduler (or no scheduler for zero sparsity) and support save_pretrained and from_pretrained methods like standard HF models.


import torch
import wandb
from transformers.models.llama.modeling_llama import LlamaConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might be missing one abstraction. We shouldn't need any model specific imports here. hf_utils.get_model_and_tokenizer abstracts the model type away; we want the same for saving HF-compatible sliced models. This save abstraction will also avoid the Sliced<Model>ForCausalLM imports here.

Copy link
Contributor

@pashminacameron pashminacameron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a single intermediate size, we can make this explicit by using the type ConstSlicingScheduler in the interface instead of SlicingScheduler. Otherwise, we can add support for all slicing schedulers and keep the base type SlicingScheduler. I don't mind how this is done - in this PR or a follow up PR. We should update the PR title to reflect the decision and contents of the PR.

Copy link
Contributor

@pashminacameron pashminacameron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm :/ Tests are not working due to import issues.

@pashminacameron pashminacameron self-requested a review April 25, 2024 21:57
@pashminacameron
Copy link
Contributor

I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review.

@canamika27
Copy link

I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review.

Any update by when the changes will be merged

parallel_blocks=True,
)

sliced_model = SlicedPhiForCausalLM.from_pretrained(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sliced_model can be of type SlicedPhiForCausalLM or SlicedLlamaForCausalLM at this point. Debugging..

model_adapter.use_cache = False

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True, token=token, local_files_only=local_model)
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, token=token, local_files_only=local_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks loading of local models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, should remain model_path.

@LianaMikael
Copy link
Contributor Author

I would like to hold off merging this into main for a bit. I will work on this more (after the smaller fixes) and we can re-review.

Any update by when the changes will be merged

We will work on merging these changes by the end of this week

@madhusrivatsav
Copy link

Hi

Please let me know when these changes will be merged ?
Has anyone verified if the models are are HF compatible now ?

@NamburiSrinath
Copy link

Hi @pashminacameron and @LianaMikael,

Is there a timeline to merge these changes to main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants