fix to jamba config, asserting attention and expert offset #33316

ErezSC42 · 2024-09-05T08:32:00Z

What does this PR do?

Ensures instanced models are supported by adding assertions to attention offset and expert offset values

vasqu · 2024-09-05T12:47:31Z

I think it would be more appropriate to move this to a separate (private) function and instead of an assertion error, raise a ValueError.

vasqu · 2024-09-05T13:02:02Z

Ah and maybe add a simple test to check if the expected errors are raised 👀

vasqu · 2024-09-10T14:42:39Z

src/transformers/models/jamba/configuration_jamba.py

+def _check_supported_offset(t_: str, period: int, offset: int):
+    if offset >= period:
+        raise ValueError(f"{t_} layer offset ({offset}) must be smaller than {t_} layer period ({period})")


Just two small nits:

Could you change t_, it's not really descriptive.

Maybe move it under JambaConfig.

vasqu · 2024-09-10T14:43:04Z

Jamba, cc @ArthurZucker

ErezSC42 · 2024-09-10T16:53:32Z

I changed t_ to a be more descriptive, but since the _check_supported_offset function is called inside JambaConfig's init, it cannot be a method defined inside the class

vasqu · 2024-09-10T17:08:09Z

Thank you!

For _check_supported_offset it should be possible, see for reference:

transformers/src/transformers/models/olmo/configuration_olmo.py

Lines 150 to 181 in f38590d

    
                   self._rope_scaling_validation() 
        
                   self.attention_bias = attention_bias 
        
                   self.attention_dropout = attention_dropout 
        
                   self.clip_qkv = clip_qkv 
        
                   super().__init__( 
        
                       pad_token_id=pad_token_id, 
        
                       bos_token_id=bos_token_id, 
        
                       eos_token_id=eos_token_id, 
        
                       tie_word_embeddings=tie_word_embeddings, 
        
                       **kwargs, 
        
                   ) 
        
               def _rope_scaling_validation(self): 
        
                   """ 
        
                   Validate the `rope_scaling` configuration. 
        
                   """ 
        
                   if self.rope_scaling is None: 
        
                       return 
        
                   if not isinstance(self.rope_scaling, dict) or len(self.rope_scaling) != 2: 
        
                       raise ValueError( 
        
                           "`rope_scaling` must be a dictionary with two fields, `type` and `factor`, " f"got {self.rope_scaling}" 
        
                       ) 
        
                   rope_scaling_type = self.rope_scaling.get("type", None) 
        
                   rope_scaling_factor = self.rope_scaling.get("factor", None) 
        
                   if rope_scaling_type is None or rope_scaling_type not in ["linear", "dynamic"]: 
        
                       raise ValueError( 
        
                           f"`rope_scaling`'s type field must be one of ['linear', 'dynamic'], got {rope_scaling_type}" 
        
                       ) 
        
                   if rope_scaling_factor is None or not isinstance(rope_scaling_factor, float) or rope_scaling_factor <= 1.0: 
        
                       raise ValueError(f"`rope_scaling`'s factor field must be a float > 1, got {rope_scaling_factor}")

At least that's what I had in mind.

ErezSC42 · 2024-09-10T17:31:17Z

my bad, it should be working now

vasqu · 2024-09-10T17:34:30Z

You can run make style which should fix the remaining issues. Thx for bearing with me :)

ErezSC42 · 2024-09-17T09:59:47Z

Hey, all the tests passed, is this PR good for merging?

vasqu · 2024-09-17T10:14:20Z

It needs a review from a core maintainer, so we can just wait.

vasqu · 2024-09-17T17:25:26Z

Maybe cc @amyeroberts

amyeroberts

LGTM - thanks for adding!

…ce#33316) * fix to jamba config, asserting attention and expert offset * fix foramtting * fix foramtting * fix foramtting * changed to error raise instead of assertion, added unittests * fix * changed t_ to property_ * changed t_ to property_ * quickfix * ran code styler

ErezSC42 added 4 commits September 5, 2024 10:43

fix to jamba config, asserting attention and expert offset

96651fd

fix foramtting

108f9d3

fix foramtting

71feee9

fix foramtting

6daee73

ErezSC42 added 2 commits September 10, 2024 12:23

changed to error raise instead of assertion, added unittests

ee35799

fix

767aa91

vasqu reviewed Sep 10, 2024

View reviewed changes

changed t_ to property_

ebdff9e

ErezSC42 added 2 commits September 10, 2024 20:27

changed t_ to property_

2278241

quickfix

8c69566

ran code styler

8a22c6a

amyeroberts reviewed Sep 17, 2024

View reviewed changes

amyeroberts approved these changes Sep 17, 2024

View reviewed changes

amyeroberts merged commit 46c2757 into huggingface:main Sep 17, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix to jamba config, asserting attention and expert offset #33316

fix to jamba config, asserting attention and expert offset #33316

ErezSC42 commented Sep 5, 2024

vasqu commented Sep 5, 2024

vasqu commented Sep 5, 2024

vasqu Sep 10, 2024

vasqu commented Sep 10, 2024

ErezSC42 commented Sep 10, 2024

vasqu commented Sep 10, 2024

ErezSC42 commented Sep 10, 2024

vasqu commented Sep 10, 2024

ErezSC42 commented Sep 17, 2024

vasqu commented Sep 17, 2024

vasqu commented Sep 17, 2024

amyeroberts left a comment

fix to jamba config, asserting attention and expert offset #33316

fix to jamba config, asserting attention and expert offset #33316

Conversation

ErezSC42 commented Sep 5, 2024

What does this PR do?

vasqu commented Sep 5, 2024

vasqu commented Sep 5, 2024

vasqu Sep 10, 2024

Choose a reason for hiding this comment

vasqu commented Sep 10, 2024

ErezSC42 commented Sep 10, 2024

vasqu commented Sep 10, 2024

ErezSC42 commented Sep 10, 2024

vasqu commented Sep 10, 2024

ErezSC42 commented Sep 17, 2024

vasqu commented Sep 17, 2024

vasqu commented Sep 17, 2024

amyeroberts left a comment

Choose a reason for hiding this comment