Skip to content

bug: utils.load_utils.extract_fbank 函數當遇到 雙聲道音檔時會錯誤 #2793

@fumin

Description

@fumin

Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)

🐛 Bug

https://github.com/modelscope/FunASR/blob/main/funasr/utils/load_utils.py#L198 extract_fbank 函數當遇到 雙聲道音檔時會錯誤。這是因為這個時候 data.shape=[2, 11520],但是 data_len=[11520]data_len應該要跟 batch_size一樣所以應該是 data_len=[11520, 11520]

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. 下載這個 雙聲道音檔

shuiqian1004_90.mp3

  1. 修改這個腳本使它執行上面的音檔 https://github.com/FunAudioLLM/Fun-ASR/blob/main/demo2.py
  2. 看到錯誤:
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 609, in inference
    return self.inference_llm(
           ~~~~~~~~~~~~~~~~~~^
        data_in,
        ^^^^^^^^
    ...<4 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 627, in inference_llm
    inputs_embeds, contents, batch, source_ids, meta_data = self.inference_prepare(
                                                            ~~~~~~~~~~~~~~~~~~~~~~^
        data_in, data_lengths, key, tokenizer, frontend, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 483, in inference_prepare
    output = self.data_load_speech(
        contents, tokenizer, frontend, meta_data=meta_data, **kwargs
    )
  File "/home/ubuntu/funasr/Fun-ASR/model.py", line 391, in data_load_speech
    speech, speech_lengths = extract_fbank(
                             ~~~~~~~~~~~~~^
        data_src,
        ^^^^^^^^^
    ...<2 lines>...
        is_final=True,
        ^^^^^^^^^^^^^^
    )  # speech: [b, T, d]
    ^
  File "/home/ubuntu/.local/lib/python3.13/site-packages/funasr/utils/load_utils.py", line 218, in extract_fbank
    data, data_len = frontend(data, data_len, **kwargs)
                     ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.local/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.local/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.13/site-packages/funasr/frontends/wav_frontend.py", line 128, in forward
    waveform_length = input_lengths[i]
                      ~~~~~~~~~~~~~^^^
IndexError: list index out of range

Code sample

import numpy as np
import soundfile as sf
import torch

from model import FunASRNano
from tools.utils import load_audio


def main():    
    wav_path = "shuiqian1004_90.mp3"

    model_dir = "FunAudioLLM/Fun-ASR-Nano-2512"
    device = (
        "cuda:0"
        if torch.cuda.is_available()
        else "mps"
        if torch.backends.mps.is_available()
        else "cpu"
    )
    m, kwargs = FunASRNano.from_pretrained(model=model_dir, device=device)
    tokenizer = kwargs.get("tokenizer", None)
    m.eval()

    chunk_size = 0.72
    duration = sf.info(wav_path).duration
    cum_durations = np.arange(chunk_size, duration + chunk_size, chunk_size)
    prev_text = ""
    for idx, cum_duration in enumerate(cum_durations):
        audio, rate = load_audio(wav_path, 16000, duration=round(cum_duration, 3))
        prev_text = m.inference([torch.tensor(audio)], prev_text=prev_text, **kwargs)[0][0]["text"]
        if idx != len(cum_durations) - 1:
            prev_text = tokenizer.decode(tokenizer.encode(prev_text)[:-5]).replace("�", "")
        if prev_text:
            print(prev_text)


if __name__ == "__main__":
    main()

Expected behavior

聽寫執行成功。

Environment

  • OS (e.g., Linux): Linux funasr 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun 5 18:30:46 UTC 2025 x86_64 GNU/Linux
  • FunASR Version (e.g., 1.0.0): 1.3.0
  • ModelScope Version (e.g., 1.11.0): 1.34.0
  • PyTorch Version (e.g., 2.0.0): 2.10.0+cu128
  • How you installed funasr (pip, source): pip
  • Python version: 3.13.7
  • GPU (e.g., V100M32): 無,使用 CPU
  • CUDA/cuDNN version (e.g., cuda11.7): 無,使用 CPU
  • Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1): 無,使用 ubuntu
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions