Feature Request
Add FunASR models (SenseVoice, Paraformer) as available ASR models in SpeechBrain.
Why
SpeechBrain is the go-to speech processing toolkit. FunASR models would be a valuable addition:
| Model |
Params |
Architecture |
Languages |
Speed |
| SenseVoice-Small |
234M |
Non-autoregressive |
50+ |
25x RT (CPU) |
| Paraformer-large |
220M |
Non-autoregressive |
zh/en/ja/ko/yue |
170x RT (GPU) |
| Fun-ASR-Nano |
800M |
LLM-based (encoder+decoder) |
31 |
GPU via vLLM |
| cam++ |
7.2M |
Speaker diarization |
- |
- |
| FSMN-VAD |
5.2M |
Voice activity detection |
- |
- |
Key advantages
- Non-autoregressive: Paraformer and SenseVoice avoid hallucination issues common in autoregressive models
- Industrial-grade: Deployed at scale (1M+ pip installs/month)
- Complete pipeline: VAD + ASR + punctuation + diarization in one package
- HuggingFace models: Available on HF Hub (
funasr/paraformer-large, funasr/sensevoice-small)
- Transformers integration in progress (PR #46180)
References
Feature Request
Add FunASR models (SenseVoice, Paraformer) as available ASR models in SpeechBrain.
Why
SpeechBrain is the go-to speech processing toolkit. FunASR models would be a valuable addition:
Key advantages
funasr/paraformer-large,funasr/sensevoice-small)References