Fix audio decoding: preserve long dtype for token indices by Copilot · Pull Request #19 · audiohacking/CTFN-Studio

Copilot · 2026-01-30T09:11:51Z

Audio decoding failed with "tensors used as indices must be long, int, byte or bool tensors" because the frames tensor was converted to float dtype before detokenization.

Changes

Removed dtype conversion in codec detokenization: Frames contain token IDs used as indices for embedding lookups and must remain as torch.long
Preserved device placement: Frames are still moved to the codec device, just without dtype conversion

# Before (incorrect - converts indices to float)
frames_for_codec = frames.to(device=pipeline.codec_device, dtype=codec_dtype)

# After (correct - preserves long dtype for indexing)
frames_for_codec = frames.to(device=pipeline.codec_device)

The codec_dtype parameter is still used for loading the codec model itself, but not applied to the token indices passed to detokenize().

Original prompt

This section details on the original issue you should resolve

<issue_title>Decoding Audio Failure</issue_title>
<issue_description>After PR #14 generation succeeds at a reasonable speed, but once completed it fails with:
Deconding Audio:
Generation Failed: invalid type: 'torch.mps.FloatTensor'
Lets fix this last bug to make sure our generations are correctly decoded to audio.</issue_description>

<agent_instructions>Issue reopened with error:
Generation Failed: tensors used as indices must be long, int, byte or bool tensors</agent_instructions>

Comments on the Issue (you are @copilot in this section)
@lmangani This change was ineffective and caused a new issue:
Generation Failed: tensors used as indices must be long, int, byte or bool tensors</comment_new>

Fixes Decoding Audio Failure #15

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…dexing Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Initial plan

684b997

Copilot AI assigned Copilot and lmangani Jan 30, 2026

Copilot started work on behalf of lmangani January 30, 2026 09:12 View session

Fix tensor dtype issue in audio decoding - keep frames as long for in…

c187b41

…dexing Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix decoding audio failure in generation process~~ Fix audio decoding: preserve long dtype for token indices Jan 30, 2026

Copilot AI requested a review from lmangani January 30, 2026 09:16

Copilot finished work on behalf of lmangani January 30, 2026 09:16

lmangani marked this pull request as ready for review January 30, 2026 09:34

lmangani merged commit dbb643a into main Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix audio decoding: preserve long dtype for token indices#19

Fix audio decoding: preserve long dtype for token indices#19
lmangani merged 2 commits intomainfrom
copilot/fix-decoding-audio-failure

Copilot AI commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Copilot AI commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 30, 2026 •

edited

Loading