-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Describe the bug
When processing long audio files with SpeechBrain's VAD on GPU, the model fails with a misleading cuDNN error when the RNN processes sequences longer than ~50,000 timesteps:
"RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."
The error message is misleading - the tensors are actually contiguous.
The error occurs in VAD.get_speech_prob_chunk() when the internal GRU layer receives sequences longer than cuDNN can handle. Specifically:
- Short sequences (<50k timesteps) process fine
- Long sequences (>50k timesteps) trigger the cuDNN error
proposed solution:
The workaround is thus to ensure the sequences are chunked if exceeding this 50K threshold.
Expected behaviour
I expected to be able to run VAD on gpu - but was hitting a runtime error:
"RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."
To Reproduce
from speechbrain.inference import VAD
# Load VAD model on GPU
vad = VAD.from_hparams(
source="speechbrain/vad-crdnn-libriparty",
run_opts={"device": "cuda"}
)
# Process a long audio file (>30 minutes)
# This will fail during double_check_speech_segments when
# get_speech_prob_chunk processes segments >50k timesteps
boundaries = vad.get_speech_segments("long_audio.wav")Environment Details
No response
Relevant Log Output
Additional Context
No response