-
Notifications
You must be signed in to change notification settings - Fork 618
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
❓ Questions and Help
Before asking:
- search the issues.
- search the docs.
What is your question?
遇到不知名的错误
Code
What have you tried?
修改sh里的参数也不行,用cuda跑也不行
What's your environment?
- OS (e.g., Linux):
- FunASR Version (e.g., 1.2.6):
- ModelScope Version (e.g., 1.13.3):
- PyTorch Version (e.g., 2.3.1):
- How you installed funasr (
pip
, source): - yes
- Python version: 3.10
- GPU (e.g., V100M32) A800*2
- CUDA/cuDNN version (e.g., cuda_11.0):
- Docker version (e.g., funasr-runtime-sdk-cpu-0.4.1)
- Any other relevant information:
报错信息如下:
Type: torch.float32
[2025-07-15 10:31:33,167][root][INFO] - Build optim
[2025-07-15 10:31:33,170][root][INFO] - Build scheduler
[2025-07-15 10:31:33,171][root][INFO] - Build dataloader
[2025-07-15 10:31:33,171][root][INFO] - Build dataloader
[2025-07-15 10:31:33,181][root][INFO] - Build optim
[2025-07-15 10:31:33,184][root][INFO] - Build scheduler
[2025-07-15 10:31:33,184][root][INFO] - Build dataloader
[2025-07-15 10:31:33,184][root][INFO] - Build dataloader
[2025-07-15 10:31:34,835][root][INFO] - total_num of samplers: 226156, /data/ASR/SenseVoice/data/train_example.jsonl
[2025-07-15 10:31:34,835][root][INFO] - total_num of samplers: 6, /data/ASR/SenseVoice/data/val_example.jsonl
[2025-07-15 10:31:34,845][root][INFO] - total_num of samplers: 226156, /data/ASR/SenseVoice/data/train_example.jsonl
[2025-07-15 10:31:34,845][root][INFO] - total_num of samplers: 6, /data/ASR/SenseVoice/data/val_example.jsonl
[2025-07-15 10:31:35,394][root][INFO] - rank: 0, dataloader start from step: 0, batch_num: 21665, after: 21665
[2025-07-15 10:31:35,414][root][INFO] - rank: 1, dataloader start from step: 0, batch_num: 21665, after: 21665
W0715 10:31:36.284000 139943396536896 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 32993 closing signal SIGTERM
E0715 10:31:48.072000 139943396536896 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: -11) local_rank: 1 (pid: 32994) of binary: /data/anaconda/envs/cosyvoice/bin/python3.10
Traceback (most recent call last):
File "/data/anaconda/envs/cosyvoice/bin/torchrun", line 8, in
sys.exit(main())
File "/data/anaconda/envs/cosyvoice/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/data/anaconda/envs/cosyvoice/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/data/anaconda/envs/cosyvoice/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/data/anaconda/envs/cosyvoice/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/data/anaconda/envs/cosyvoice/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
/data/FunASR/funasr/bin/train_ds.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2025-07-15_10:31:36
host : localhost.localdomain
rank : 1 (local_rank: 1)
exitcode : -11 (pid: 32994)
error_file: <N/A>
traceback : Signal 11 (SIGSEGV) received by PID 32994
是版本问题么
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested