-
Notifications
You must be signed in to change notification settings - Fork 121
Use smaller initial LR #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lead to a new master net official-stockfish/Stockfish#3848 this is in combination with a restart from a good net (previous master). Visually, the net starts a bit better after the restart, probably logical as the initial perturbations are smaller.
|
I think we should instead add an "lr" option to the train script. The right LR may be different for the retraining process compared to training from scratch. This change is a potential regression for one [fundamental] use-case. |
|
yes, it would be good to have a few things more as an option. This makes it somewhat easier to test as well. Nevertheless, good defaults are needed, and I propose using those that lead to a net in master is the safe bet. I'll see if I can update the PR later today/this week. |
|
Perhaps we could only apply this LR change if training from model? I.e. when either |
|
I think too much 'smart logic' leads to unexpected behaviour, better have it explicit. |
|
I will insist at least on a comment specifying that |
|
yes, I think we should probably document the best options for training from scratch vs restarts... as said, establish the full procedure for training from scratch (which, as I understand, includes retraining cycles). |
trained with essentially the same data as provided and used by Farseer (mbabigian) for the previous master net. T60T70wIsRightFarseerT60T74T75T76.binpack (99GB): ['T60T70wIsRightFarseer.binpack', 'farseerT74.binpack', 'farseerT75.binpack', 'farseerT76.binpack'] using the trainer branch tweakLR1PR (official-stockfish/nnue-pytorch#158) and `--gpus 1 --threads 4 --num-workers 4 --batch-size 16384 --progress_bar_refresh_rate 300 --smart-fen-skipping --random-fen-skipping 12 --features=HalfKAv2_hm^ --lambda=1.00` options passed STC: LLR: 2.95 (-2.94,2.94) <0.00,2.50> Total: 108280 W: 28042 L: 27636 D: 52602 Ptnml(0-2): 328, 12382, 28401, 12614, 415 https://tests.stockfishchess.org/tests/view/61bcd8c257a0d0f327c34fbd passed LTC: LLR: 2.94 (-2.94,2.94) <0.50,3.00> Total: 259296 W: 66974 L: 66175 D: 126147 Ptnml(0-2): 146, 27096, 74452, 27721, 233 https://tests.stockfishchess.org/tests/view/61bda70957a0d0f327c37817 Bench: 4633875
|
led to another master net official-stockfish/Stockfish#3870 However, some experiments training from scratch shows indeed that this is not optimal for that purpose, only for retraining. As such, shouldn't be merged in this form. Keeping open for the time being. |
Trained with essentially the same data as provided and used by Farseer (mbabigian) for the previous master net. T60T70wIsRightFarseerT60T74T75T76.binpack (99GB): ['T60T70wIsRightFarseer.binpack', 'farseerT74.binpack', 'farseerT75.binpack', 'farseerT76.binpack'] using the trainer branch tweakLR1PR (official-stockfish/nnue-pytorch#158) and `--gpus 1 --threads 4 --num-workers 4 --batch-size 16384 --progress_bar_refresh_rate 300 --smart-fen-skipping --random-fen-skipping 12 --features=HalfKAv2_hm^ --lambda=1.00` options passed STC: LLR: 2.95 (-2.94,2.94) <0.00,2.50> Total: 108280 W: 28042 L: 27636 D: 52602 Ptnml(0-2): 328, 12382, 28401, 12614, 415 https://tests.stockfishchess.org/tests/view/61bcd8c257a0d0f327c34fbd passed LTC: LLR: 2.94 (-2.94,2.94) <0.50,3.00> Total: 259296 W: 66974 L: 66175 D: 126147 Ptnml(0-2): 146, 27096, 74452, 27721, 233 https://tests.stockfishchess.org/tests/view/61bda70957a0d0f327c37817 closes #3870 Bench: 4633875
|
superseeded by new arch trainer PR #164 |
lead to a new master net official-stockfish/Stockfish#3848
this is in combination with a restart from a good net (previous master). Visually,
the net starts a bit better after the restart, probably logical as the initial perturbations
are smaller.