Skip to content

Commit 12006e1

Browse files
authored
Add sve2 fast path nthSetBitIndex (official-stockfish#363)
basically official-stockfish#282 godbolt comparison https://godbolt.org/z/nYTEbbGEc master profile ``` Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 29.84 3.08 3.08 167655038 0.00 0.00 chess::nthSetBitIndex(unsigned long, unsigned long) 27.13 5.88 2.80 84438239 0.00 0.00 binpack::PackedMoveScoreListReader::nextMoveScore(chess::Position const&) 17.25 7.66 1.78 binpack::CompressedTrainingDataEntryParallelReader::CompressedTrainingDataEntryParallelReader( 6.40 8.32 0.66 85282216 0.00 0.00 make_skip_predicate(DataloaderSkipConfig)::{lambda(binpack::TrainingDataEntry const&)official-stockfish#1}::operator()(binpack::TrainingDataEntry const&) const 5.52 8.89 0.57 102779442 0.00 0.00 double std::generate_canonical 2.71 9.17 0.28 159902274 0.00 0.00 std::thread::_State_impl<std::thread::_Invoker<std::tuple<FenBatchStream::FenBatchStream( 2.42 9.42 0.25 15960812 0.00 0.00 chess::Board::isSquareAttacked(chess::Square, chess::Color) const 1.65 9.59 0.17 _init ``` patch with nthSetBitIndex bottleneck gone ``` Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 48.05 4.07 4.07 87234281 0.00 0.00 binpack::PackedMoveScoreListReader::nextMoveScore(chess::Position const&) 20.66 5.82 1.75 binpack::CompressedTrainingDataEntryParallelReader::CompressedTrainingDataEntryParallelReader 8.15 6.51 0.69 88106154 0.00 0.00 make_skip_predicate(DataloaderSkipConfig)::{lambda(binpack::TrainingDataEntry const&)official-stockfish#1}::operator()(binpack::TrainingDataEntry const&) const 7.67 7.16 0.65 106181833 0.00 0.00 double std::generate_canonical 2.01 7.70 0.17 16485835 0.00 0.00 chess::Board::isSquareAttacked(chess::Square, chess::Color) const 1.53 7.83 0.13 106182037 0.00 0.00 rng::get_thread_local_rng() 1.53 7.96 0.13 10305607 0.00 0.00 chess::Bitboard chess::bb::attacks<(chess::PieceType)4>(chess::Square, chess::Bitboard) ```
1 parent 782ab62 commit 12006e1

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

lib/nnue_training_data_formats.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
5454

5555
#ifdef HAS_BMI2
5656
#include <immintrin.h> // _pdep_u64
57+
#elif defined(__ARM_FEATURE_SVE2)
58+
#include <arm_sve.h>
5759
#endif
5860

5961
#include "rng.h"
@@ -273,6 +275,15 @@ namespace chess
273275
{
274276
#ifdef HAS_BMI2
275277
return intrin::msb(_pdep_u64(1ULL << n, v));
278+
#elif defined(__ARM_FEATURE_SVE2)
279+
uint64_t src = 1ULL << n;
280+
281+
svuint64_t vec_src = svdup_n_u64(src);
282+
svuint64_t vec_mask = svdup_n_u64(v);
283+
284+
svuint64_t result = svbdep_u64(vec_src, vec_mask);
285+
286+
return intrin::lsb(svlastb_u64(svptrue_b64(), result));
276287
#endif
277288

278289
std::uint64_t shift = 0;

0 commit comments

Comments
 (0)