Granite speech - minor fixes to support training with the HF trainer #38833

avihu111 · 2025-06-15T14:00:37Z

What does this PR do?

Minor updates to granite_speech to enable finetuning it with HF trainers.

avoids a crash when trainers pass padding=True to the processor
ensure all trainable parameters have gradients (bugfix) - remove .data from a forward call
rename melspec to mel_filters to leverage this, which avoids a crash on save_pretrained

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

CC: @ArthurZucker @eustlb @alex-jw-brooks @avishaiElmakies @gsaon

avoid unused parameters that DDP does not like

trainers often pass this argument automatically

this ensures save_pretrained will not crash when saving the processor during training https://github.com/huggingface/transformers/blob/d5d007a1a0f0c11a726a54c8f00bd71825f84d02/src/transformers/feature_extraction_utils.py#L595

alex-jw-brooks · 2025-06-16T14:12:31Z

src/transformers/models/granite_speech/feature_extraction_granite_speech.py

@@ -50,15 +50,16 @@ def __init__(
        **kwargs,
    ):
        super().__init__(**kwargs)
+        self.sampling_rate = sampling_rate


I think this isn't used currently

Right, it's not used. I added it to stay consistent with other audio feature extractors that have this property.

avihu111

Added some comments on each change, giving relevant context

avihu111 · 2025-06-16T14:23:42Z

src/transformers/models/granite_speech/processing_granite_speech.py

@@ -88,7 +88,9 @@ def __call__(
        else:
            audio_inputs = {}

-        text_inputs = self.tokenizer(prompt_strings, padding=True, **kwargs)
+        if "padding" not in kwargs:


avoids a crash when trainers pass padding=True to the processor

avihu111 · 2025-06-16T14:25:35Z

src/transformers/models/granite_speech/modeling_granite_speech.py

@@ -92,7 +92,7 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
        hidden_states = hidden_states.view(batch_size * nblocks, self.window_size, dim)

        query_output = self.qformer(
-            query_embeds=self.query.data,
+            query_embeds=self.query,


Bugfix. When using .data this trainable parameter did not receive gradients.

avihu111 · 2025-06-16T14:26:31Z

src/transformers/models/granite_speech/feature_extraction_granite_speech.py

-        # Currently lazily initialized
-        self.melspec = None
+        requires_backends(self, ["torchaudio"])
+        self.mel_filters = torchaudio.transforms.MelSpectrogram(**self.melspec_kwargs)


removed the lazy init, and renamed it to mel_filters. This specific name avoids a crash when serializing the processor.

avihu111 added 2 commits June 15, 2025 16:50

ensure the query is updated during training

db4a4af

avoid unused parameters that DDP does not like

avoid a crash when kwargs contain padding=True

8ee3429

trainers often pass this argument automatically

avihu111 marked this pull request as draft June 15, 2025 14:19

avihu111 added 3 commits June 15, 2025 14:21

minor

8dec2ba

Remove mel_spec lazy init, and rename to mel_filters.

4db4c99

this ensures save_pretrained will not crash when saving the processor during training https://github.com/huggingface/transformers/blob/d5d007a1a0f0c11a726a54c8f00bd71825f84d02/src/transformers/feature_extraction_utils.py#L595

minor - most feature extractors has a sampling_rate property

98844ec

alex-jw-brooks reviewed Jun 16, 2025

View reviewed changes

Merge branch 'main' into granite_speech_updates

6e68d8c

avihu111 marked this pull request as ready for review June 16, 2025 14:22

avihu111 commented Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Granite speech - minor fixes to support training with the HF trainer #38833

Granite speech - minor fixes to support training with the HF trainer #38833

avihu111 commented Jun 15, 2025 •

edited

Loading

Uh oh!

alex-jw-brooks Jun 16, 2025

Uh oh!

avihu111 Jun 16, 2025

Uh oh!

avihu111 left a comment

Uh oh!

avihu111 Jun 16, 2025

Uh oh!

avihu111 Jun 16, 2025

Uh oh!

avihu111 Jun 16, 2025

Uh oh!

avihu111 Jun 16, 2025

Uh oh!

Uh oh!

Granite speech - minor fixes to support training with the HF trainer #38833

Are you sure you want to change the base?

Granite speech - minor fixes to support training with the HF trainer #38833

Conversation

avihu111 commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

alex-jw-brooks Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

avihu111 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

avihu111 left a comment

Choose a reason for hiding this comment

Uh oh!

avihu111 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

avihu111 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

avihu111 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

avihu111 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

avihu111 commented Jun 15, 2025 •

edited

Loading