Draft
Conversation
Collaborator
|
@asumagic, at which stage we are with that? It looks like some tests are failing |
Collaborator
|
@asumagic, what the status of this PR? |
Use modified Conformer warmup (from k2) in LibriSpeech RNN-T
277f653 to
ca93780
Compare
22ac4d6 to
aee71b4
Compare
Collaborator
Author
|
Putting on hold, may or may not pick it up again after other changes that warrant retraining the model. The approach is generic enough that it probably doesn't need to be implemented at Conformer level (rather at TransformerASR level). Also should maybe rename away from "scheduler" since this is not a LR scheduler? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Very simple experiment to try switching away from Noam scheduling with warmup to a layer-wise skip mechanism for warmup inspired by k2 (as explained in https://medium.com/@nadirapovey/next-gen-kaldi-reworked-conformer-model-8a3828f364af).
In theory, this might allow initial convergence happen much earlier in training.
Will turn into a proper PR if this works well.
Before submitting
PR review
Reviewer checklist