Summary
Xinference supports various LLM/embedding/image/audio models. Would you consider adding FunASR models (SenseVoice, Paraformer, Fun-ASR-Nano) as audio/speech model backends?
Why FunASR?
FunASR is the most popular open-source ASR toolkit for Chinese and multilingual speech recognition:
- SenseVoice (234M params): Non-autoregressive, ~25x faster than Whisper-large, 50+ languages, emotion + audio event detection
- Paraformer (220M params): ~170x realtime on GPU for Chinese, built-in VAD + punctuation
- Fun-ASR-Nano (800M params): LLM-based ASR (SenseVoice encoder + Qwen3-0.6B decoder), 31 languages
- cam++: Speaker diarization model (7.2M params)
Integration
FunASR already provides an OpenAI-compatible API server:
pip install funasr vllm
funasr-server --device cuda
# http://localhost:8000/v1/audio/transcriptions
This could serve as a reference for integrating into Xinference's model serving framework.
Model ecosystem
| Model |
Task |
Params |
Speed |
| SenseVoice-Small |
ASR + emotion |
234M |
~25x vs Whisper |
| Paraformer-large |
Chinese ASR |
220M |
~170x realtime |
| Fun-ASR-Nano |
Multilingual ASR |
800M |
LLM-based |
| FSMN-VAD |
Voice Activity Detection |
0.4M |
— |
| CT-Punc |
Punctuation |
— |
— |
| cam++ |
Speaker Diarization |
7.2M |
— |
All models available on ModelScope and HuggingFace (FunAudioLLM org).
References
Summary
Xinference supports various LLM/embedding/image/audio models. Would you consider adding FunASR models (SenseVoice, Paraformer, Fun-ASR-Nano) as audio/speech model backends?
Why FunASR?
FunASR is the most popular open-source ASR toolkit for Chinese and multilingual speech recognition:
Integration
FunASR already provides an OpenAI-compatible API server:
pip install funasr vllm funasr-server --device cuda # http://localhost:8000/v1/audio/transcriptionsThis could serve as a reference for integrating into Xinference's model serving framework.
Model ecosystem
All models available on ModelScope and HuggingFace (FunAudioLLM org).
References
pip install funasr