Skip to content

Comments

Fix MPS tensor dtype in audio generation pipeline#21

Merged
lmangani merged 2 commits intomainfrom
copilot/fix-decoding-audio-error-again
Jan 30, 2026
Merged

Fix MPS tensor dtype in audio generation pipeline#21
lmangani merged 2 commits intomainfrom
copilot/fix-decoding-audio-error-again

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

HeartMuLa's generate_frame() returns float16 tensors on MPS devices, but codec.detokenize() requires long integers for token indexing. Previous fix attempted conversion at detokenization (line 850), but dtype was already baked into stacked frames.

Changes

Convert tokens to long immediately after generation:

  • Line 761: After initial token generation
  • Line 797: After each generation loop iteration
with get_autocast_context(pipeline.mula_device.type, pipeline.mula_dtype):
    curr_token = pipeline.mula.generate_frame(...)
# Convert to long immediately after generation (tokens must be integers)
curr_token = curr_token.long()
frames.append(curr_token[0:1,])

This ensures torch.stack(frames) operates on long tensors from the start, preserving dtype through subsequent operations (permute, squeeze, cpu, device transfer).

Original prompt

This section details on the original issue you should resolve

<issue_title>Decoding Audio Failure</issue_title>
<issue_description>After PR #14 generation succeeds at a reasonable speed, but once completed it fails with:

Deconding Audio:
Generation Failed: invalid type: 'torch.mps.FloatTensor'

Lets fix this last bug to make sure our generations are correctly decoded to audio.</issue_description>

<agent_instructions>The fix was not effective. At the end of the generation the app still fails with:

invalid type: 'torch.mps.FloatTensor'
</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@lmangani This change was ineffective and caused a new issue:

Generation Failed: tensors used as indices must be long, int, byte or bool tensors</comment_new>
<comment_new>@lmangani
Generation still fails in the last conversion phase:

Generation Failed: invalid type: 'torch.mps.FloatTensor'
@lmangani The fix was not effective. At the end of the generation the app still fails with:

invalid type: 'torch.mps.FloatTensor'


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix decoding audio failure in generation process Fix MPS tensor dtype in audio generation pipeline Jan 30, 2026
Copilot AI requested a review from lmangani January 30, 2026 13:09
@lmangani lmangani marked this pull request as ready for review January 30, 2026 13:09
@lmangani lmangani merged commit c176603 into main Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decoding Audio Failure

2 participants