Pure managed C# port of WebRTC's voice activity detection (VAD) algorithm.
No native dependencies - works on any platform that supports .NET 8+.
dotnet add package WebRtcVad.NET
using WebRtcVad.NET;
// Create a VAD instance
using var vad = new WebRtcVad
{
SampleRate = SampleRate.Rate16kHz,
FrameLength = FrameLength.Length20ms,
OperatingMode = OperatingMode.Aggressive
};
// Check if audio frame contains speech
byte[] audioFrame = GetAudioFrame(); // 16-bit PCM, little-endian
bool hasSpeech = vad.HasSpeech(audioFrame);
// Or use Span<short> for better performance
Span<short> samples = GetAudioSamples();
bool hasSpeech = vad.HasSpeech(samples);SampleRate.Rate8kHzSampleRate.Rate16kHzSampleRate.Rate32kHzSampleRate.Rate48kHz
FrameLength.Length10msFrameLength.Length20msFrameLength.Length30ms
| Mode | Description |
|---|---|
Quality |
Least aggressive - best for clean audio |
LowBitrate |
Slightly more aggressive |
Aggressive |
For moderately noisy audio |
VeryAggressive |
Most aggressive - for very noisy audio |
Higher aggressiveness = fewer false positives (noise detected as speech) but more false negatives (missed speech).
This library only supports raw 16-bit linear PCM audio. It will not work with WAV files or other container formats directly - you need to extract the raw PCM data first.
This is a bit-exact port of the WebRTC VAD algorithm, which uses a Gaussian Mixture Model (GMM) to classify audio frames as speech or non-speech. The algorithm:
- Splits audio into 6 frequency bands
- Calculates energy features for each band
- Uses GMM to compute speech/noise probabilities
- Applies adaptive thresholds based on operating mode
MIT License - see LICENSE
The algorithm is ported from WebRTC, which is licensed under the WebRTC License.