Skip to content

Conversation

@vondele
Copy link
Member

@vondele vondele commented Feb 28, 2017

simplifies away all associated checks, leading to a ~0.5% speedup.
The code now explicitly checks if moves are OK, rather than using nullptr checks.

Verified for no regression:

LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32218 W: 5762 L: 5660 D: 20796

No functional change.

simplifies away all associated checks, leading to a ~0.5% speedup.
The code now explicitly checks if moves are OK, rather than using nullptr checks.

Verified for no regression:

LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 32218 W: 5762 L: 5660 D: 20796

No functional change.
@vondele
Copy link
Member Author

vondele commented Feb 28, 2017

The (better) alternative to #1019

@snicolet
Copy link
Member

snicolet commented Mar 1, 2017

Looks nice!

Not sure if the cm_ok, fm_okand fm2_ok variables are really necessary.

@snicolet
Copy link
Member

snicolet commented Mar 1, 2017

Here is a version without the cm_ok, fm_ok and fm2_ok variables:
master...snicolet:cmh_pr

I introduced an is_ok()method in the Stack structure to check if the current move is OK.

=====================
Speed test:

Total time (ms) : 149778
Nodes searched : 209935136
Nodes/second : 1401642
testedpatch

Total time (ms) : 150808
Nodes searched : 209935136
Nodes/second : 1392068
master

==> testedpatch is 0.7% faster than master

Speed test done using this script:

#!/bin/sh

# change directory to the path of the script
cd "${0%/*}"

# name of my testing directory (edit accordingly)
testdir=./fishtest-for-local-tests/worker/testing

md5 $testdir/testedpatch
md5 $testdir/master

# two long bench runs, in parallel
$testdir/testedpatch  bench 16 1 22 > /dev/null && echo "testedpatch"  &
$testdir/master       bench 16 1 22 > /dev/null && echo "master"

# end of the speed test
echo
echo "... done"
echo

@vondele
Copy link
Member Author

vondele commented Mar 1, 2017

@snicolet thanks! I had is_ok calls in the first implementation, but introduced the variables after a benchmark, suggesting that was faster... however, maybe that benchmark was not very accurate. Do you want me to merge in your changes ?

@snicolet
Copy link
Member

snicolet commented Mar 1, 2017

@vondele You do as you want, it is your code :-)

@vondele
Copy link
Member Author

vondele commented Mar 1, 2017

@snicolet but your changes ;-)

I'll have a look later, and a more careful benchmark

@vondele
Copy link
Member Author

vondele commented Mar 1, 2017

@snicolet, in my renewed testing, the is_ok() calls slow down by about 0.3-0.4% , which is consistent with what I had earlier.

I'll thus leave the pull request as is, unless there is consensus that having the is_ok() calls should be considered a useful simplification (opinions welcome).

For reference, I use the following script for benchmarking, which tries to use a robust estimate of performance:

#!/bin/bash

# test performance of stockfish.1 and stockfish.2 binaries. 

# number of tests:
ntests=100

printf "Running $ntests tests: "

# remove so we don't mix in data that could be stale
rm -f out.1 out.2

for i in `seq 1 $ntests`
do
  printf "%d " $i
  
  # run alternating but simultaneously on 2 cores
  for j in $((1+i%2)) $((2-i%2))
  do
      ./stockfish.$j bench 2>&1 | grep 'Nodes/second' | awk '{print $NF}' >> out.$j &
      # except on small machines we want one core idle
      wait
  done
  wait

done
printf ".\nDone.\n"

# analysis, run afterwards to not influence the benchmarking
for i in `seq $(((9+ntests)/10)) $(((9+ntests)/10)) $ntests`
do
  # use ranking to robustly estimate the speed, avoids outliers.
  # 1/3 Rank is near the maximum of the distribution, which is non-Gaussian.
  n33=$(((i+2)/3))
  v1=`head -n $i out.1 | sort -rn | head -n $n33 | tail -n1`
  v2=`head -n $i out.2 | sort -rn | head -n $n33 | tail -n1`
  echo "$v1 $v2 $i $ntests"
  echo "$v1 $v2 $i $ntests" | awk '{printf("%6d / %6d : current speedup estimate variant 2 (%d ms) vs. variant 1 (%d ms): %8.3f \n",$3,$4,$2,$1,200*($2-$1)/($1+$2))}'
done

another ~0.5% speedup.

No functional change.
@snicolet
Copy link
Member

snicolet commented Mar 2, 2017

@vondele, @locutus2

Stefan made the acute remark in the comments of the following test that when we use thisThread->counterMoveHistory[NO_PIECE][SQ_NONE] for the sentinel, we are in fact accessing memory in thisThread->counterMoveHistory[W_PAWN][SQ_A1] because SQ_NONE is 65.
snicolet/Stockfish@1810c4d...e96154a

So you should probably use thisThread->counterMoveHistory[NO_PIECE][0] instead for the sentinel (this applies in lines 338, 606, 749 of search.cpp). Speed is exactly the same.

Concerning code style, I would also add some spaces in the for loop, as usual:

  for (int i = -4 ; i < 0 ; ++i) 
     (ss+i)->counterMoves = &this->counterMoveHistory[NO_PIECE][0]; // use as sentinel

And maybe a blank line after line 749 ? :-)

as made by snicolet and locutus2

No functional change.
@vondele
Copy link
Member Author

vondele commented Mar 2, 2017

@snicolet thanks for the useful review. All included in the last commit.

@zamar
Copy link

zamar commented Mar 6, 2017

@vondele: I try to get this reviewed in the next couple of days...

@zamar zamar closed this in d490bb9 Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants