Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
Jun 10, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
On-device VAD / streaming STT / TTS / diarization in C++17 (ONNX + LiteRT) with a voice-agent pipeline. Linux, Windows, Android.
Streaming on-device speech recognition for Android — NEON-accelerated, encrypted FastConformer (32M params), ~150 ms latency, no cloud. Powered by the VoxRT runtime.
Streaming on-device speech recognition for iOS — NEON-accelerated, encrypted FastConformer (32M params), RTF 0.08–0.10 on iPhone 13 Pro Max. Built on the VoxRT custom Rust inference runtime. SwiftPM distribution.
A 1300-hour English speech and text corpus of parliamentary debates for streaming ASR training and benchmarking, speech data filtering and speech data verbatimization.
Pre-compiled ASR model weights for the VoxRT on-device runtime. Encrypted .vxrt v2 format. streaming-medium-pc: FastConformer 32M, CTC + RNN-T, CC-BY-4.0 (NVIDIA NeMo).
Faster-Whisper Transcription Server & API is a production-ready speech-to-text micro-service stack that wraps faster-whisper with a streaming FastAPI server, a Celery/Redis background queue, and optional Docker deployment—delivering real-time or batch audio transcription with minimal latency and simple web-hook integration.
OpenAI-compatible proxy bridging Doubao/Volcengine ASR 2.0 (Seed-ASR) WebSocket protocol to /v1/audio/transcriptions; works with Spokenly and OpenAI-compatible clients. OpenAI 兼容代理:将豆包/火山引擎 ASR 2.0(Seed-ASR)WebSocket 协议桥接到 /v1/audio/transcriptions,适用于 Spokenly 与其他 OpenAI 兼容客户端。
Production-ready REST API for Russian speech recognition using T-one model. FastAPI-based service with offline and streaming transcription support.
Windows 桌面豆包语音输入工具 — 全局快捷键录音 → 火山引擎流式 ASR → 自动粘贴到光标。原生支持豆包平台热词表 ID。
PhD Thesis: "Automatic speech recognition and machine translation with deep neural networks for open educational resources, parliamentary contents and broadcast media" (2024)
Lightweight Windows voice input tool with offline streaming ASR, hotwords, and AI text correction
Low-latency voice AI agent platform with streaming ASR/TTS, FSM-based dialog management, and microservices architecture. Built with FastAPI, LangGraph, vLLM, and F5-TTS.
Injecting semantic in Streaming Automatic Speech Recognition models
Local CPU WebSocket and browser demo for icefall streaming ASR ONNX models
Perform on-device streaming speech recognition on iOS using the high-performance VoxRT inference runtime with custom NEON-accelerated kernels.
Stream on-device speech recognition on Android using the custom VoxRT inference runtime with NeMo FastConformer support.
Add a description, image, and links to the streaming-asr topic page so that developers can more easily learn about it.
To associate your repository with the streaming-asr topic, visit your repo's landing page and select "manage topics."