Use smaller initial LR #158

vondele · 2021-12-13T06:24:47Z

lead to a new master net official-stockfish/Stockfish#3848
this is in combination with a restart from a good net (previous master). Visually,
the net starts a bit better after the restart, probably logical as the initial perturbations
are smaller.

lead to a new master net official-stockfish/Stockfish#3848 this is in combination with a restart from a good net (previous master). Visually, the net starts a bit better after the restart, probably logical as the initial perturbations are smaller.

Sopel97 · 2021-12-13T12:21:23Z

I think we should instead add an "lr" option to the train script. The right LR may be different for the retraining process compared to training from scratch. This change is a potential regression for one [fundamental] use-case.

vondele · 2021-12-13T13:51:12Z

yes, it would be good to have a few things more as an option. This makes it somewhat easier to test as well. Nevertheless, good defaults are needed, and I propose using those that lead to a net in master is the safe bet.

I'll see if I can update the PR later today/this week.

Sopel97 · 2021-12-13T13:59:39Z

Perhaps we could only apply this LR change if training from model? I.e. when either resume_from_checkpoint or resume-from-model is used. (and the lr is not specified manually)

vondele · 2021-12-13T14:00:53Z

I think too much 'smart logic' leads to unexpected behaviour, better have it explicit.

Sopel97 · 2021-12-13T14:02:19Z

I will insist at least on a comment specifying that LR = 8.75e-4 may yield better results when training networks from scratch.

vondele · 2021-12-13T14:04:40Z

yes, I think we should probably document the best options for training from scratch vs restarts... as said, establish the full procedure for training from scratch (which, as I understand, includes retraining cycles).

trained with essentially the same data as provided and used by Farseer (mbabigian) for the previous master net. T60T70wIsRightFarseerT60T74T75T76.binpack (99GB): ['T60T70wIsRightFarseer.binpack', 'farseerT74.binpack', 'farseerT75.binpack', 'farseerT76.binpack'] using the trainer branch tweakLR1PR (official-stockfish/nnue-pytorch#158) and `--gpus 1 --threads 4 --num-workers 4 --batch-size 16384 --progress_bar_refresh_rate 300 --smart-fen-skipping --random-fen-skipping 12 --features=HalfKAv2_hm^ --lambda=1.00` options passed STC: LLR: 2.95 (-2.94,2.94) <0.00,2.50> Total: 108280 W: 28042 L: 27636 D: 52602 Ptnml(0-2): 328, 12382, 28401, 12614, 415 https://tests.stockfishchess.org/tests/view/61bcd8c257a0d0f327c34fbd passed LTC: LLR: 2.94 (-2.94,2.94) <0.50,3.00> Total: 259296 W: 66974 L: 66175 D: 126147 Ptnml(0-2): 146, 27096, 74452, 27721, 233 https://tests.stockfishchess.org/tests/view/61bda70957a0d0f327c37817 Bench: 4633875

vondele · 2021-12-22T10:01:06Z

led to another master net official-stockfish/Stockfish#3870

However, some experiments training from scratch shows indeed that this is not optimal for that purpose, only for retraining. As such, shouldn't be merged in this form. Keeping open for the time being.

Trained with essentially the same data as provided and used by Farseer (mbabigian) for the previous master net. T60T70wIsRightFarseerT60T74T75T76.binpack (99GB): ['T60T70wIsRightFarseer.binpack', 'farseerT74.binpack', 'farseerT75.binpack', 'farseerT76.binpack'] using the trainer branch tweakLR1PR (official-stockfish/nnue-pytorch#158) and `--gpus 1 --threads 4 --num-workers 4 --batch-size 16384 --progress_bar_refresh_rate 300 --smart-fen-skipping --random-fen-skipping 12 --features=HalfKAv2_hm^ --lambda=1.00` options passed STC: LLR: 2.95 (-2.94,2.94) <0.00,2.50> Total: 108280 W: 28042 L: 27636 D: 52602 Ptnml(0-2): 328, 12382, 28401, 12614, 415 https://tests.stockfishchess.org/tests/view/61bcd8c257a0d0f327c34fbd passed LTC: LLR: 2.94 (-2.94,2.94) <0.50,3.00> Total: 259296 W: 66974 L: 66175 D: 126147 Ptnml(0-2): 146, 27096, 74452, 27721, 233 https://tests.stockfishchess.org/tests/view/61bda70957a0d0f327c37817 closes #3870 Bench: 4633875

Sopel97 · 2022-02-10T13:53:21Z

superseeded by new arch trainer PR #164

Use smaller initial LR

f998880

lead to a new master net official-stockfish/Stockfish#3848 this is in combination with a restart from a good net (previous master). Visually, the net starts a bit better after the restart, probably logical as the initial perturbations are smaller.

vondele mentioned this pull request Dec 22, 2021

Update default net to nn-ac07bd334b62.nnue official-stockfish/Stockfish#3870

Closed

Sopel97 closed this Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use smaller initial LR #158

Use smaller initial LR #158

Uh oh!

vondele commented Dec 13, 2021

Uh oh!

Sopel97 commented Dec 13, 2021 •

edited

Loading

Uh oh!

vondele commented Dec 13, 2021

Uh oh!

Sopel97 commented Dec 13, 2021 •

edited

Loading

Uh oh!

vondele commented Dec 13, 2021

Uh oh!

Sopel97 commented Dec 13, 2021

Uh oh!

vondele commented Dec 13, 2021

Uh oh!

vondele commented Dec 22, 2021

Uh oh!

Sopel97 commented Feb 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use smaller initial LR #158

Use smaller initial LR #158

Uh oh!

Conversation

vondele commented Dec 13, 2021

Uh oh!

Sopel97 commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vondele commented Dec 13, 2021

Uh oh!

Sopel97 commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vondele commented Dec 13, 2021

Uh oh!

Sopel97 commented Dec 13, 2021

Uh oh!

vondele commented Dec 13, 2021

Uh oh!

vondele commented Dec 22, 2021

Uh oh!

Sopel97 commented Feb 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sopel97 commented Dec 13, 2021 •

edited

Loading

Sopel97 commented Dec 13, 2021 •

edited

Loading