Skip to content

HyperConformer and Conformer Achieve Higher WER on test-other When Trained on 100h LibriSpeech #3041

@zzm196

Description

@zzm196

Describe the bug

I am using this recipe: https://github.com/speechbrain/speechbrain/tree/develop/recipes/LibriSpeech/ASR/transformer. I modified the training set in hparams/hyperconformer_8M to 100h LibriSpeech, and changed the attention_type to RelPosMHAXL to obtain conformer_8M. I conducted multiple experiments, and the WERs were all higher than those of HyperConformer and Conformer in Table 2 of the paper HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition.
The experimental results of conformer_8M are as follows (WER corresponding to batch_size, n_gpus, grad_accumulation_factor, lr_adam):
15.57 (16, 1, 1, 0.001) – the project's default configuration
15.53 (16, 2, 2, 0.004)
19.98 (32, 2, 2, 0.001)
19.71 (32, 1, 2, 0.001)
I am a beginner. Currently, I only know that I need to increase the global batch size, but I don't know how to effectively adjust the specific parameters. Therefore, I am seeking help on how to tune these parameters to achieve the results in Table 2 of the paper. All these experiments were trained on an RTX 3090 24G.
The command I executed is:CUDA_VISIBLE_DEVICES=6,7 torchrun --nproc_per_node=2 train.py hparams/conformer_8M

Expected behaviour

The WER of conformer_8M should be around 8.29 (according to Table 2 of HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition).

To Reproduce

No response

Environment Details

No response

Relevant Log Output

batch_size, n_gpus, grad_accumulation_factor, lr_adam:16,1,1,0.001
conformer_8M:
epoch: 10, lr: 2.58e-04, steps: 6450, optimizer: Adam - train loss: 2.51e+02 - valid loss: 1.96e+02, valid ACC: 2.02e-01, valid WER: 1.06e+02
epoch: 20, lr: 5.16e-04, steps: 12900, optimizer: Adam - train loss: 1.23e+02 - valid loss: 79.15, valid ACC: 6.35e-01, valid WER: 41.40
epoch: 30, lr: 7.74e-04, steps: 19350, optimizer: Adam - train loss: 66.70 - valid loss: 48.60, valid ACC: 7.97e-01, valid WER: 21.52
epoch: 40, lr: 9.84e-04, steps: 25800, optimizer: Adam - train loss: 53.02 - valid loss: 39.94, valid ACC: 8.34e-01, valid WER: 17.25
epoch: 50, lr: 8.80e-04, steps: 32250, optimizer: Adam - train loss: 41.96 - valid loss: 36.77, valid ACC: 8.55e-01, valid WER: 14.83
epoch: 60, lr: 8.04e-04, steps: 38700, optimizer: Adam - train loss: 36.05 - valid loss: 33.44, valid ACC: 8.65e-01, valid WER: 13.59
epoch: 70, lr: 7.44e-04, steps: 45150, optimizer: Adam - train loss: 32.15 - valid loss: 32.92, valid ACC: 8.68e-01, valid WER: 12.74
epoch: 80, lr: 6.96e-04, steps: 51600, optimizer: Adam - train loss: 29.23 - valid loss: 33.27, valid ACC: 8.71e-01, valid WER: 12.19
epoch: 90, lr: 6.56e-04, steps: 58050, optimizer: Adam - train loss: 27.22 - valid loss: 35.81, valid ACC: 8.71e-01, valid WER: 11.96
epoch: 100, lr: 6.23e-04, steps: 64500, optimizer: Adam - train loss: 25.85 - valid loss: 33.84, valid ACC: 8.74e-01, valid WER: 11.65
epoch: 110, lr: 5.94e-04, steps: 70950, optimizer: Adam - train loss: 24.51 - valid loss: 33.23, valid ACC: 8.75e-01, valid WER: 11.38
Epoch loaded: 110 - test loss: 17.27, test ACC: 8.88e-01, test WER: 6.20(test-clean)
Epoch loaded: 110 - test loss: 10.42, test ACC: 7.96e-01, test WER: 15.57(test-other)
hyperconformer_8M:
epoch: 10, lr: 2.58e-04, steps: 6450, optimizer: Adam - train loss: 2.53e+02 - valid loss: 1.99e+02, valid ACC: 1.98e-01, valid WER: 97.59
epoch: 20, lr: 5.16e-04, steps: 12900, optimizer: Adam - train loss: 1.57e+02 - valid loss: 1.04e+02, valid ACC: 5.27e-01, valid WER: 59.12
epoch: 30, lr: 7.74e-04, steps: 19350, optimizer: Adam - train loss: 70.45 - valid loss: 48.92, valid ACC: 7.98e-01, valid WER: 22.46
epoch: 40, lr: 9.84e-04, steps: 25800, optimizer: Adam - train loss: 54.82 - valid loss: 38.98, valid ACC: 8.36e-01, valid WER: 17.52
epoch: 50, lr: 8.80e-04, steps: 32250, optimizer: Adam - train loss: 44.58 - valid loss: 35.27, valid ACC: 8.57e-01, valid WER: 14.95
epoch: 60, lr: 8.04e-04, steps: 38700, optimizer: Adam - train loss: 38.89 - valid loss: 33.29, valid ACC: 8.65e-01, valid WER: 13.91
epoch: 70, lr: 7.44e-04, steps: 45150, optimizer: Adam - train loss: 35.58 - valid loss: 32.95, valid ACC: 8.69e-01, valid WER: 13.38

epoch: 80, lr: 6.96e-04, steps: 51600, optimizer: Adam - train loss: 32.93 - valid loss: 31.66, valid ACC: 8.74e-01, valid WER: 12.71
epoch: 90, lr: 6.56e-04, steps: 58050, optimizer: Adam - train loss: 30.80 - valid loss: 31.78, valid ACC: 8.75e-01, valid WER: 12.30
epoch: 100, lr: 6.23e-04, steps: 64500, optimizer: Adam - train loss: 29.47 - valid loss: 31.03, valid ACC: 8.78e-01, valid WER: 11.96
epoch: 110, lr: 5.94e-04, steps: 70950, optimizer: Adam - train loss: 28.24 - valid loss: 30.49, valid ACC: 8.79e-01, valid WER: 11.79
Epoch loaded: 110 - test loss: 8.58, test ACC: 8.90e-01, test WER: 6.29(test-clean)
Epoch loaded: 110 - test loss: 10.44, test ACC: 7.94e-01, test WER: 16.47(test-other)

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions