Gemma3 implementation needs to be updated.

### 🚀 The feature, motivation and pitch

The Transformers implementation of Gemma3 has undergone various modifications, but the implementation in this repository does not reflect those changes and does not load Gemma3 properly.  
For example, in transformers, the `_update_causal_mask` is already not used.

https://github.com/huggingface/transformers/blob/e6a8063ef1af16df964b644b07e1d17e96555d23/src/transformers/models/gemma3/modular_gemma3.py#L748-L749
``` 
def _update_causal_mask(self, **super_kwargs):
        raise AttributeError("We don't want to inherit it")
```

https://github.com/linkedin/Liger-Kernel/blob/ecdf6defae16cae3a3615bc29377c25d9f1d3dc2/src/liger_kernel/transformers/model/gemma3.py#L260-L262


### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemma3 implementation needs to be updated. #786

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	causal_mask = self._update_causal_mask(
	attention_mask, token_type_ids, past_key_values, cache_position, inputs_embeds, is_training
	)

Gemma3 implementation needs to be updated. #786

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions