HalfKAv2_hm-1024x2-8-32-1. nn-exp135-run7-epoch519.nnue #3646
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes some changes to the architecture of the network. The changes were squashed to a single commit.
The summary of the changes:
The AffineTransform code did not work out-of-the box with the smaller number of neurons after the second layer, so some temporary changes have been made to add a special case for
InputDimensions == 8. Also additional 0 padding is added to the output for some archs that cannot process inputs by <=8 (SSE2, NEON). VNNI uses an implementation that can keep all outputs in the registers while reducing the number of loads by 3 for each 16 inputs, thanks to the reduced number of output neurons. However GCC is particularily bad at optimization here (and perhaps why the current way the affine transform is done even passed sprt) (see here for details) and more work will be done on this in the following days. I expect the current VNNI implementation to be improved and extended to other architectures.The network was trained with a slightly modified version of the pytorch trainer; the changes will get a separate PR there soon.
The training utilized 2 datasets.
The training process was as following:
--resume-from-modelfrom the .pt file, train on dataset B for <600 epochs, take the best net. Lambda=0.8, applied before the loss function.The first training command:
The second training command:
STC: https://tests.stockfishchess.org/tests/view/611120b32a8a49ac5be798c4
LTC: https://tests.stockfishchess.org/tests/view/611152b32a8a49ac5be798ea