HalfKAv2_hm-1024x2-8-32-1. nn-exp135-run7-epoch519.nnue #3646

Sopel97 · 2021-08-09T20:01:59Z

This PR makes some changes to the architecture of the network. The changes were squashed to a single commit.

The summary of the changes:

Position for each perspective mirrored such that the king is on e..h files. Cuts the feature transformer size in half, while preserving enough knowledge to be good. See here.
The number of neurons after the feature transformer increased two-fold, to 1024x2. This is possibly mostly due to the now very optimized feature transformer update code.
The number of neurons after the second layer is reduced from 16 to 8, to reduce the speed impact. This, perhaps surprisingly, doesn't harm the strength much. See here

The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see here for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures.

The network was trained with a slightly modified version of the pytorch trainer; the changes will get a separate PR there soon.

The training utilized 2 datasets.

dataset A - https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing
dataset B - as described in ba01f4b

The training process was as following:

train on dataset A for 350 epochs, take the best net in terms of elo at 20k nodes per move (it's fine to take anything from later stages of training).
convert the .ckpt to .pt
--resume-from-model from the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function.

The first training command:

python3 train.py \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=1.0 \
    --max_epochs=600 \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

The second training command:

python3 serialize.py \
    --features=HalfKAv2_hm^ \
    ../nnue-pytorch-training/experiment_131/run_6/default/version_0/checkpoints/epoch-499.ckpt \
    ../nnue-pytorch-training/experiment_$1/base/base.pt

python3 train.py \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \
    --gpus "$3," \
    --threads 1 \
    --num-workers 1 \
    --batch-size 16384 \
    --progress_bar_refresh_rate 20 \
    --smart-fen-skipping \
    --random-fen-skipping 3 \
    --features=HalfKAv2_hm^ \
    --lambda=0.8 \
    --max_epochs=600 \
    --resume-from-model ../nnue-pytorch-training/experiment_$1/base/base.pt \
    --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2

STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4

LLR: 2.97 (-2.94,2.94) <-0.50,2.50>
Total: 22480 W: 2434 L: 2251 D: 17795
Ptnml(0-2): 101, 1736, 7410, 1865, 128

LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea

LLR: 2.93 (-2.94,2.94) <0.50,3.50>
Total: 9776 W: 442 L: 333 D: 9001
Ptnml(0-2): 5, 295, 4180, 402, 6

bench: 5189338

FauziAkram · 2021-08-13T13:37:16Z

Congrats
When will this get merged?

MichaelB7 · 2021-08-14T17:23:05Z

Well done!

Introduces a new NNUE network architecture and associated network parameters The summary of the changes: * Position for each perspective mirrored such that the king is on e..h files. Cuts the feature transformer size in half, while preserving enough knowledge to be good. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.b40q4rb1w7on. * The number of neurons after the feature transformer increased two-fold, to 1024x2. This is possibly mostly due to the now very optimized feature transformer update code. * The number of neurons after the second layer is reduced from 16 to 8, to reduce the speed impact. This, perhaps surprisingly, doesn't harm the strength much. See https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit#heading=h.6qkocr97fezq The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see https://docs.google.com/document/d/1gTlrr02qSNKiXNZ_SuO4-RjK4MXBiFlLE6jvNqqMkAY/edit# for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures. The network was trained with a slightly modified version of the pytorch trainer (https://github.com/glinscott/nnue-pytorch); the changes are in official-stockfish/nnue-pytorch#143 The training utilized 2 datasets. dataset A - https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing dataset B - as described in official-stockfish@ba01f4b The training process was as following: train on dataset A for 350 epochs, take the best net in terms of elo at 20k nodes per move (it's fine to take anything from later stages of training). convert the .ckpt to .pt --resume-from-model from the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function. The first training command: python3 train.py \ ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \ ../nnue-pytorch-training/data/large_gensfen_multipvdiff_100_d9.binpack \ --gpus "$3," \ --threads 1 \ --num-workers 1 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --smart-fen-skipping \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=1.0 \ --max_epochs=600 \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2 The second training command: python3 serialize.py \ --features=HalfKAv2_hm^ \ ../nnue-pytorch-training/experiment_131/run_6/default/version_0/checkpoints/epoch-499.ckpt \ ../nnue-pytorch-training/experiment_$1/base/base.pt python3 train.py \ ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \ ../nnue-pytorch-training/data/michael_commit_b94a65.binpack \ --gpus "$3," \ --threads 1 \ --num-workers 1 \ --batch-size 16384 \ --progress_bar_refresh_rate 20 \ --smart-fen-skipping \ --random-fen-skipping 3 \ --features=HalfKAv2_hm^ \ --lambda=0.8 \ --max_epochs=600 \ --resume-from-model ../nnue-pytorch-training/experiment_$1/base/base.pt \ --default_root_dir ../nnue-pytorch-training/experiment_$1/run_$2 STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4 LLR: 2.97 (-2.94,2.94) <-0.50,2.50> Total: 22480 W: 2434 L: 2251 D: 17795 Ptnml(0-2): 101, 1736, 7410, 1865, 128 LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea LLR: 2.93 (-2.94,2.94) <0.50,3.50> Total: 9776 W: 442 L: 333 D: 9001 Ptnml(0-2): 5, 295, 4180, 402, 6 closes official-stockfish#3646 bench: 5189338

Sopel97 mentioned this pull request Aug 9, 2021

Update trainer for 2021-08-09 new master net official-stockfish/nnue-pytorch#143

Merged

Sopel97 force-pushed the nn-exp135-run7-epoch519.nnue branch from c1cf742 to 9bd9cc4 Compare August 9, 2021 20:13

HalfKAv2_hm-1024x2-8-32-1. nn-exp135-run7-epoch519.nnue

7dc7a9a

bench: 5189338

Sopel97 force-pushed the nn-exp135-run7-epoch519.nnue branch from 9bd9cc4 to 7dc7a9a Compare August 9, 2021 20:21

vondele closed this in d61d385 Aug 15, 2021

hi-ogawa mentioned this pull request Aug 17, 2021

Use Sopel97/smolnet hi-ogawa/Stockfish#10

Closed

IIvec mentioned this pull request Aug 17, 2021

New NNUE architecture and net IIvec/Stockfish#1492

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HalfKAv2_hm-1024x2-8-32-1. nn-exp135-run7-epoch519.nnue #3646

HalfKAv2_hm-1024x2-8-32-1. nn-exp135-run7-epoch519.nnue #3646

Uh oh!

Sopel97 commented Aug 9, 2021 •

edited

Loading

Uh oh!

FauziAkram commented Aug 13, 2021

Uh oh!

MichaelB7 commented Aug 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HalfKAv2_hm-1024x2-8-32-1. nn-exp135-run7-epoch519.nnue #3646

HalfKAv2_hm-1024x2-8-32-1. nn-exp135-run7-epoch519.nnue #3646

Uh oh!

Conversation

Sopel97 commented Aug 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FauziAkram commented Aug 13, 2021

Uh oh!

MichaelB7 commented Aug 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sopel97 commented Aug 9, 2021 •

edited

Loading