Skip to content

VAD fails with cuDNN error when using GPU #2970

@peterATIn2Dialog

Description

@peterATIn2Dialog

Describe the bug

When processing long audio files with SpeechBrain's VAD on GPU, the model fails with a misleading cuDNN error when the RNN processes sequences longer than ~50,000 timesteps:

"RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."

The error message is misleading - the tensors are actually contiguous.

The error occurs in VAD.get_speech_prob_chunk() when the internal GRU layer receives sequences longer than cuDNN can handle. Specifically:

  • Short sequences (<50k timesteps) process fine
  • Long sequences (>50k timesteps) trigger the cuDNN error

proposed solution:

The workaround is thus to ensure the sequences are chunked if exceeding this 50K threshold.


Expected behaviour

I expected to be able to run VAD on gpu - but was hitting a runtime error:

"RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input."

To Reproduce

  from speechbrain.inference import VAD

  # Load VAD model on GPU
  vad = VAD.from_hparams(
      source="speechbrain/vad-crdnn-libriparty",
      run_opts={"device": "cuda"}
  )

  # Process a long audio file (>30 minutes)
  # This will fail during double_check_speech_segments when 
  # get_speech_prob_chunk processes segments >50k timesteps
  boundaries = vad.get_speech_segments("long_audio.wav")

Environment Details

No response

Relevant Log Output

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions