-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
🐛 Bug
https://github.com/modelscope/FunASR/blob/main/funasr/utils/load_utils.py#L198 extract_fbank 函數當遇到 雙聲道音檔時會錯誤。這是因為這個時候 data.shape=[2, 11520],但是 data_len=[11520],data_len應該要跟 batch_size一樣所以應該是 data_len=[11520, 11520]。
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
- 下載這個 雙聲道音檔
- 修改這個腳本使它執行上面的音檔 https://github.com/FunAudioLLM/Fun-ASR/blob/main/demo2.py
- 看到錯誤:
File "/home/ubuntu/funasr/Fun-ASR/model.py", line 609, in inference
return self.inference_llm(
~~~~~~~~~~~~~~~~~~^
data_in,
^^^^^^^^
...<4 lines>...
**kwargs,
^^^^^^^^^
)
^
File "/home/ubuntu/funasr/Fun-ASR/model.py", line 627, in inference_llm
inputs_embeds, contents, batch, source_ids, meta_data = self.inference_prepare(
~~~~~~~~~~~~~~~~~~~~~~^
data_in, data_lengths, key, tokenizer, frontend, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/ubuntu/funasr/Fun-ASR/model.py", line 483, in inference_prepare
output = self.data_load_speech(
contents, tokenizer, frontend, meta_data=meta_data, **kwargs
)
File "/home/ubuntu/funasr/Fun-ASR/model.py", line 391, in data_load_speech
speech, speech_lengths = extract_fbank(
~~~~~~~~~~~~~^
data_src,
^^^^^^^^^
...<2 lines>...
is_final=True,
^^^^^^^^^^^^^^
) # speech: [b, T, d]
^
File "/home/ubuntu/.local/lib/python3.13/site-packages/funasr/utils/load_utils.py", line 218, in extract_fbank
data, data_len = frontend(data, data_len, **kwargs)
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/.local/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/home/ubuntu/.local/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.13/site-packages/funasr/frontends/wav_frontend.py", line 128, in forward
waveform_length = input_lengths[i]
~~~~~~~~~~~~~^^^
IndexError: list index out of range
Code sample
import numpy as np
import soundfile as sf
import torch
from model import FunASRNano
from tools.utils import load_audio
def main():
wav_path = "shuiqian1004_90.mp3"
model_dir = "FunAudioLLM/Fun-ASR-Nano-2512"
device = (
"cuda:0"
if torch.cuda.is_available()
else "mps"
if torch.backends.mps.is_available()
else "cpu"
)
m, kwargs = FunASRNano.from_pretrained(model=model_dir, device=device)
tokenizer = kwargs.get("tokenizer", None)
m.eval()
chunk_size = 0.72
duration = sf.info(wav_path).duration
cum_durations = np.arange(chunk_size, duration + chunk_size, chunk_size)
prev_text = ""
for idx, cum_duration in enumerate(cum_durations):
audio, rate = load_audio(wav_path, 16000, duration=round(cum_duration, 3))
prev_text = m.inference([torch.tensor(audio)], prev_text=prev_text, **kwargs)[0][0]["text"]
if idx != len(cum_durations) - 1:
prev_text = tokenizer.decode(tokenizer.encode(prev_text)[:-5]).replace("�", "")
if prev_text:
print(prev_text)
if __name__ == "__main__":
main()
Expected behavior
聽寫執行成功。
Environment
- OS (e.g., Linux):
Linux funasr 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun 5 18:30:46 UTC 2025 x86_64 GNU/Linux - FunASR Version (e.g., 1.0.0): 1.3.0
- ModelScope Version (e.g., 1.11.0): 1.34.0
- PyTorch Version (e.g., 2.0.0): 2.10.0+cu128
- How you installed funasr (
pip, source): pip - Python version: 3.13.7
- GPU (e.g., V100M32): 無,使用 CPU
- CUDA/cuDNN version (e.g., cuda11.7): 無,使用 CPU
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1): 無,使用 ubuntu
- Any other relevant information:
Additional context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working