Skip to content

feat: Add FunASR/SenseVoice as speech recognition model backend #4976

@LauraGPT

Description

@LauraGPT

Feature request

Xinference already supports various model types (LLM, embedding, image, audio). Adding FunASR/SenseVoice as a speech recognition backend would complement the existing audio capabilities.

Why FunASR for Xinference

  • 5x faster than Whisper (SenseVoice is non-autoregressive, 234M params)
  • Full pipeline: ASR + VAD + punctuation + speaker diarization in one toolkit
  • 50+ languages with automatic language detection
  • OpenAI-compatible APIfunasr-server at /v1/audio/transcriptions
  • Multiple models: SenseVoice (fast), Paraformer (Chinese-optimized), Fun-ASR-Nano (LLM-based, 31 languages)
  • 1M+ monthly pip installs — widely adopted

Integration approach

FunASR models could be registered as a new model type in Xinference:

# Model registration
{
    "model_name": "SenseVoice-Small",
    "model_type": "audio",
    "model_family": "funasr",
    "model_id": "iic/SenseVoiceSmall",
}

The inference backend would wrap FunASR's Python API:

from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_data)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions