Skip to content

Commit 95fe2b9

Browse files
mstemberavondele
authored andcommitted
Reduce SIMD register count from 32 to 16
in the case of avx512 and vnni512 archs. Up to 17% speedup, depending on the compiler, e.g. ``` AMD pro 7840u (zen4 phoenix apu 4nm) bash bench_parallel.sh ./stockfish_avx512_gcc13 ./stockfish_avx512_pr_gcc13 20 10 sf_base = 1077737 +/- 8446 (95%) sf_test = 1264268 +/- 8543 (95%) diff = 186531 +/- 4280 (95%) speedup = 17.308% +/- 0.397% (95%) ``` Prior to this patch, it appears gcc spills registers. closes #4796 No functional change
1 parent fce4cc1 commit 95fe2b9

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/nnue/nnue_feature_transformer.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ namespace Stockfish::Eval::NNUE {
6969
#define vec_add_psqt_32(a,b) _mm256_add_epi32(a,b)
7070
#define vec_sub_psqt_32(a,b) _mm256_sub_epi32(a,b)
7171
#define vec_zero_psqt() _mm256_setzero_si256()
72-
#define NumRegistersSIMD 32
72+
#define NumRegistersSIMD 16
7373
#define MaxChunkSize 64
7474

7575
#elif USE_AVX2

0 commit comments

Comments
 (0)