Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix to jamba config, asserting attention and expert offset #33316

Merged
merged 10 commits into from
Sep 17, 2024

Conversation

ErezSC42
Copy link
Contributor

@ErezSC42 ErezSC42 commented Sep 5, 2024

What does this PR do?

Ensures instanced models are supported by adding assertions to attention offset and expert offset values

@vasqu
Copy link
Contributor

vasqu commented Sep 5, 2024

I think it would be more appropriate to move this to a separate (private) function and instead of an assertion error, raise a ValueError.

@vasqu
Copy link
Contributor

vasqu commented Sep 5, 2024

Ah and maybe add a simple test to check if the expected errors are raised 👀

Comment on lines 26 to 28
def _check_supported_offset(t_: str, period: int, offset: int):
if offset >= period:
raise ValueError(f"{t_} layer offset ({offset}) must be smaller than {t_} layer period ({period})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small nits:

  • Could you change t_, it's not really descriptive.
  • Maybe move it under JambaConfig.

@vasqu
Copy link
Contributor

vasqu commented Sep 10, 2024

Jamba, cc @ArthurZucker

@ErezSC42
Copy link
Contributor Author

I changed t_ to a be more descriptive, but since the _check_supported_offset function is called inside JambaConfig's init, it cannot be a method defined inside the class

@vasqu
Copy link
Contributor

vasqu commented Sep 10, 2024

Thank you!

For _check_supported_offset it should be possible, see for reference:

self._rope_scaling_validation()
self.attention_bias = attention_bias
self.attention_dropout = attention_dropout
self.clip_qkv = clip_qkv
super().__init__(
pad_token_id=pad_token_id,
bos_token_id=bos_token_id,
eos_token_id=eos_token_id,
tie_word_embeddings=tie_word_embeddings,
**kwargs,
)
def _rope_scaling_validation(self):
"""
Validate the `rope_scaling` configuration.
"""
if self.rope_scaling is None:
return
if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) != 2:
raise ValueError(
"`rope_scaling` must be a dictionary with two fields, `type` and `factor`, " f"got {self.rope_scaling}"
)
rope_scaling_type = self.rope_scaling.get("type", None)
rope_scaling_factor = self.rope_scaling.get("factor", None)
if rope_scaling_type is None or rope_scaling_type not in ["linear", "dynamic"]:
raise ValueError(
f"`rope_scaling`'s type field must be one of ['linear', 'dynamic'], got {rope_scaling_type}"
)
if rope_scaling_factor is None or not isinstance(rope_scaling_factor, float) or rope_scaling_factor <= 1.0:
raise ValueError(f"`rope_scaling`'s factor field must be a float > 1, got {rope_scaling_factor}")

At least that's what I had in mind.

@ErezSC42
Copy link
Contributor Author

my bad, it should be working now

@vasqu
Copy link
Contributor

vasqu commented Sep 10, 2024

You can run make style which should fix the remaining issues. Thx for bearing with me :)

@ErezSC42
Copy link
Contributor Author

Hey, all the tests passed, is this PR good for merging?

@vasqu
Copy link
Contributor

vasqu commented Sep 17, 2024

It needs a review from a core maintainer, so we can just wait.

@vasqu
Copy link
Contributor

vasqu commented Sep 17, 2024

Maybe cc @amyeroberts

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for adding!

@amyeroberts amyeroberts merged commit 46c2757 into huggingface:main Sep 17, 2024
13 checks passed
itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024
…ce#33316)

* fix to jamba config, asserting attention and expert offset

* fix foramtting

* fix foramtting

* fix foramtting

* changed to error raise instead of assertion, added unittests

* fix

* changed t_ to property_

* changed t_ to property_

* quickfix

* ran code styler
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Oct 2, 2024
…ce#33316)

* fix to jamba config, asserting attention and expert offset

* fix foramtting

* fix foramtting

* fix foramtting

* changed to error raise instead of assertion, added unittests

* fix

* changed t_ to property_

* changed t_ to property_

* quickfix

* ran code styler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants