-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Phi #132
Conversation
@@ -325,6 +328,19 @@ def get_model( | |||
trust_remote_code=trust_remote_code, | |||
) | |||
raise NotImplementedError("Qwen model requires flash attention v2") | |||
|
|||
if model_type == "phi-msft" or model_type == "phi": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about using model_type in [] instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
super().__init__() | ||
self.num_heads = config.n_head | ||
self.hidden_size = config.n_embd | ||
self.head_size = self.hidden_size // self.num_heads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional to use self.num_heads
like this here but then update the value of self.num_heads
by the number of process groups in line 110?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when creating the modules we need the num_heads
value pre-splitting. Then later on we use num_heads
post-split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
revision=revision, | ||
padding_side="left", | ||
truncation_side="left", | ||
trust_remote_code=trust_remote_code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're adding trust_remote_code
, do we already install the einops
library to convert weights when using microsoft/phi-1.5
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trust_remote_code
is False
by default. einops
is installed already (for Falcon). This current implementation works with the microsoft version of the weights, rather than the changes HF made.
Example:
Prompt:
Response: