-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Update default net to nn-8a08400ed089.nnue #3474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tiple of 4 but not a multiple of 32
…a567be.nnue. Bench: 3517854
This corrects the behavior for some nets, but it's unclear whether it fixes it entirely for all possible weights.
…f70.nnue. Bench: 4137649
|
appveyor fails because of the appveyor script picks up the wrong reference number from the commit message. Just needs to put the Bench on a different line. |
…089.nnue. Bench: 3806488
|
|
||
| featureTransformer->transform(pos, transformedFeatures); | ||
| const auto output = network->propagate(transformedFeatures, buffer); | ||
| const std::size_t bucket = (popcount(pos.pieces()) - 1) / 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: pos.count<ALL_PIECES>() is probably faster on most machines than popcount(pos.pieces()).
|
I'm fine to merge this pull request in, to change the net architecture. Congrats :-) And if I think I have a simpler/alternative scaling scheme for STC or LTC, we can just resume the normal testing procedure on fishtest. Concerning the merge, I think it would be best to perhaps keep two commits for the pull request:
|
|
Great job! |
|
Easiest way to create such two commits is probably to use the diff .. trick, to get the global diff from master to the PR version: Then in that diff, for first commit use all changes except in evaluate.cpp and search.cpp |
|
OK |
Introduces a new NNUE network architecture and associated network parameters, as obtained by a new pytorch trainer. The network is already very strong at short TC, without regression at longer TC, and has potential for further improvements. https://tests.stockfishchess.org/tests/view/60a159c65085663412d0921d TC: 10s+0.1s, 1 thread ELO: 21.74 +-3.4 (95%) LOS: 100.0% Total: 10000 W: 1559 L: 934 D: 7507 Ptnml(0-2): 38, 701, 2972, 1176, 113 https://tests.stockfishchess.org/tests/view/60a187005085663412d0925b TC: 60s+0.6s, 1 thread ELO: 5.85 +-1.7 (95%) LOS: 100.0% Total: 20000 W: 1381 L: 1044 D: 17575 Ptnml(0-2): 27, 885, 7864, 1172, 52 https://tests.stockfishchess.org/tests/view/60a2beede229097940a03806 TC: 20s+0.2s, 8 threads LLR: 2.93 (-2.94,2.94) <0.50,3.50> Total: 34272 W: 1610 L: 1452 D: 31210 Ptnml(0-2): 30, 1285, 14350, 1439, 32 https://tests.stockfishchess.org/tests/view/60a2d687e229097940a03c72 TC: 60s+0.6s, 8 threads LLR: 2.94 (-2.94,2.94) <-2.50,0.50> Total: 45544 W: 1262 L: 1214 D: 43068 Ptnml(0-2): 12, 1129, 20442, 1177, 12 The network has been trained (by vondele) using the https://github.com/glinscott/nnue-pytorch/ trainer (started by glinscott), specifically the branch https://github.com/Sopel97/nnue-pytorch/tree/experiment_56. The data used are in 64 billion positions (193GB total) generated and scored with the current master net d8: https://drive.google.com/file/d/1hOOYSDKgOOp38ZmD0N4DV82TOLHzjUiF/view?usp=sharing d9: https://drive.google.com/file/d/1VlhnHL8f-20AXhGkILujnNXHwy9T-MQw/view?usp=sharing d10: https://drive.google.com/file/d/1ZC5upzBYMmMj1gMYCkt6rCxQG0GnO3Kk/view?usp=sharing fishtest_d9: https://drive.google.com/file/d/1GQHt0oNgKaHazwJFTRbXhlCN3FbUedFq/view?usp=sharing This network also contains a few architectural changes with respect to the current master: Size changed from 256x2-32-32-1 to 512x2-16-32-1 ~15-20% slower ~2x larger adds a special path for 16 valued ClippedReLU fixes affine transform code for 16 inputs/outputs, buy using InputDimensions instead of PaddedInputDimensions this is safe now because the inputs are processed in groups of 4 in the current affine transform code The feature set changed from HalfKP to HalfKAv2 Includes information about the kings like HalfKA Packs king features better, resulting in 8% size reduction compared to HalfKA The board is flipped for the black's perspective, instead of rotated like in the current master PSQT values for each feature the feature transformer now outputs a part that is fowarded directly to the output and allows learning piece values more directly than the previous network architecture. The effect is visible for high imbalance positions, where the current master network outputs evaluations skewed towards zero. 8 PSQT values per feature, chosen based on (popcount(pos.pieces()) - 1) / 4 initialized to classical material values on the start of the training 8 subnetworks (512x2->16->32->1), chosen based on (popcount(pos.pieces()) - 1) / 4 only one subnetwork is evaluated for any position, no or marginal speed loss A diagram of the network is available: https://user-images.githubusercontent.com/8037982/118656988-553a1700-b7eb-11eb-82ef-56a11cbebbf2.png closes official-stockfish#3474 Bench: 3806488
This network was trained by @vondele using this trainer (the trainer master branch will be updated in the near future), using a combination of data: d8, d9, d10, fishtest_d9.
This network also contains a few architectural changes with respect to the current master:
InputDimensionsinstead ofPaddedInputDimensions(popcount(pos.pieces()) - 1) / 4(popcount(pos.pieces()) - 1) / 4Additionally, we experienced a lot high weights in most nets from the pytorch trainer, which causes the slow affine transform path to reduce the performance of the net significantly, therefore it was removed for the time of testing. A revert can be attempted for up to 3% speedup, depending on how many large weights there are.
Diagram of the new architecture:
