Feature request
Xinference already supports various model types (LLM, embedding, image, audio). Adding FunASR/SenseVoice as a speech recognition backend would complement the existing audio capabilities.
Why FunASR for Xinference
- 5x faster than Whisper (SenseVoice is non-autoregressive, 234M params)
- Full pipeline: ASR + VAD + punctuation + speaker diarization in one toolkit
- 50+ languages with automatic language detection
- OpenAI-compatible API —
funasr-server at /v1/audio/transcriptions
- Multiple models: SenseVoice (fast), Paraformer (Chinese-optimized), Fun-ASR-Nano (LLM-based, 31 languages)
- 1M+ monthly pip installs — widely adopted
Integration approach
FunASR models could be registered as a new model type in Xinference:
# Model registration
{
"model_name": "SenseVoice-Small",
"model_type": "audio",
"model_family": "funasr",
"model_id": "iic/SenseVoiceSmall",
}
The inference backend would wrap FunASR's Python API:
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_data)
References
Feature request
Xinference already supports various model types (LLM, embedding, image, audio). Adding FunASR/SenseVoice as a speech recognition backend would complement the existing audio capabilities.
Why FunASR for Xinference
funasr-serverat/v1/audio/transcriptionsIntegration approach
FunASR models could be registered as a new model type in Xinference:
The inference backend would wrap FunASR's Python API:
References
pip install funasr