Skip to content

Comments

Fix audio decoding: preserve long dtype for token indices#19

Merged
lmangani merged 2 commits intomainfrom
copilot/fix-decoding-audio-failure
Jan 30, 2026
Merged

Fix audio decoding: preserve long dtype for token indices#19
lmangani merged 2 commits intomainfrom
copilot/fix-decoding-audio-failure

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

Audio decoding failed with "tensors used as indices must be long, int, byte or bool tensors" because the frames tensor was converted to float dtype before detokenization.

Changes

  • Removed dtype conversion in codec detokenization: Frames contain token IDs used as indices for embedding lookups and must remain as torch.long
  • Preserved device placement: Frames are still moved to the codec device, just without dtype conversion
# Before (incorrect - converts indices to float)
frames_for_codec = frames.to(device=pipeline.codec_device, dtype=codec_dtype)

# After (correct - preserves long dtype for indexing)
frames_for_codec = frames.to(device=pipeline.codec_device)

The codec_dtype parameter is still used for loading the codec model itself, but not applied to the token indices passed to detokenize().

Original prompt

This section details on the original issue you should resolve

<issue_title>Decoding Audio Failure</issue_title>
<issue_description>After PR #14 generation succeeds at a reasonable speed, but once completed it fails with:

Deconding Audio:
Generation Failed: invalid type: 'torch.mps.FloatTensor'

Lets fix this last bug to make sure our generations are correctly decoded to audio.</issue_description>

<agent_instructions>Issue reopened with error:
Generation Failed: tensors used as indices must be long, int, byte or bool tensors</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@lmangani This change was ineffective and caused a new issue:

Generation Failed: tensors used as indices must be long, int, byte or bool tensors</comment_new>


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…dexing

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix decoding audio failure in generation process Fix audio decoding: preserve long dtype for token indices Jan 30, 2026
Copilot AI requested a review from lmangani January 30, 2026 09:16
@lmangani lmangani marked this pull request as ready for review January 30, 2026 09:34
@lmangani lmangani merged commit dbb643a into main Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decoding Audio Failure

2 participants