-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix to jamba config, asserting attention and expert offset #33316
Conversation
I think it would be more appropriate to move this to a separate (private) function and instead of an assertion error, raise a |
Ah and maybe add a simple test to check if the expected errors are raised 👀 |
def _check_supported_offset(t_: str, period: int, offset: int): | ||
if offset >= period: | ||
raise ValueError(f"{t_} layer offset ({offset}) must be smaller than {t_} layer period ({period})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two small nits:
- Could you change
t_
, it's not really descriptive. - Maybe move it under
JambaConfig
.
Jamba, cc @ArthurZucker |
I changed t_ to a be more descriptive, but since the _check_supported_offset function is called inside JambaConfig's init, it cannot be a method defined inside the class |
Thank you! For transformers/src/transformers/models/olmo/configuration_olmo.py Lines 150 to 181 in f38590d
At least that's what I had in mind. |
my bad, it should be working now |
You can run |
Hey, all the tests passed, is this PR good for merging? |
It needs a review from a core maintainer, so we can just wait. |
Maybe cc @amyeroberts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for adding!
…ce#33316) * fix to jamba config, asserting attention and expert offset * fix foramtting * fix foramtting * fix foramtting * changed to error raise instead of assertion, added unittests * fix * changed t_ to property_ * changed t_ to property_ * quickfix * ran code styler
…ce#33316) * fix to jamba config, asserting attention and expert offset * fix foramtting * fix foramtting * fix foramtting * changed to error raise instead of assertion, added unittests * fix * changed t_ to property_ * changed t_ to property_ * quickfix * ran code styler
What does this PR do?
Ensures instanced models are supported by adding assertions to attention offset and expert offset values