Skip to content

Conversation

@vondele
Copy link
Member

@vondele vondele commented Jul 5, 2022

this refines the loss function to the form used for the new master net in official-stockfish/Stockfish#4100

The new loss function uses the expect game score to learn,
making the the learning more sensitive to those scores between loss and draw, draw and win.

Most visible for smaller values of the scaling parameter, but the current ones have been optimized.

it also introduces param_index for simpler explorations of paramers, i.e. simple parameter scans.

@Sopel97
Copy link
Member

Sopel97 commented Jul 5, 2022

The --param_index is needlessly limiting in my opinion, moreso considering it's not being used right now. If we want to keep it it would be better to expose in/out scaling as cli parameters directly. I'm fine with the rest.

this refines the loss function to the form used for the new master net in official-stockfish/Stockfish#4100

The new loss function uses the expect game score to learn,
making the the learning more sensitive to those scores between loss and draw, draw and win.

Most visible for smaller values of the scaling parameter, but the current ones have been optimized.

it also introduces param_index for simpler explorations of paramers, i.e. simple parameter scans.
@vondele
Copy link
Member Author

vondele commented Jul 5, 2022

Ah, wait, I didn't commit the needed cleanup changes..

@vondele
Copy link
Member Author

vondele commented Jul 5, 2022

The idea of the param_index is just to simplify exploration, I don't think we need to expose everything as parameters, but it is quite tedious to pass something down from the cli to the core (ideally even to the C++ data reader).

Maybe there is a more elegant approach to doing so ?

@vondele
Copy link
Member Author

vondele commented Jul 5, 2022

BTW, the previous state of the commit showed, how I have been using it.

@Sopel97
Copy link
Member

Sopel97 commented Jul 5, 2022

That makes sense now. Indeed, I've been usually adding a special parameter for stuff like that (usually varied per run). Looks good to me now.

@Sopel97 Sopel97 merged commit e3f70f2 into official-stockfish:master Jul 5, 2022
@snicolet
Copy link
Member

snicolet commented Jul 8, 2022

Do we have any picture(s) to help visualize the idea behind the double sigmoid for the loss function, and the effect of param_index?

@vondele
Copy link
Member Author

vondele commented Jul 8, 2022

param_index is just 'a hack' to be able to twiddle with parameters that we don't want to expose, so can be ignored. The following illustrates the idea:

gnuplot> sigmoid(x)=1/(1+exp(-x))
gnuplot> win_rate(x, b)=sigmoid((x-270) / b)
gnuplot> score(x, b)=0.5 * ( 1.0 + win_rate(x, b) - win_rate(-x, b))
gnuplot> set xlabel "eval"
gnuplot> set ylabel "score"
gnuplot> set xrange[-800:800]
gnuplot> plot score(x, 50), score(x, 300)

image

@vondele
Copy link
Member Author

vondele commented Jul 8, 2022

To make this more intuitive, we know games are quite won near 135 cp (270 in the above internal units), and quite lost near -135cp. They are also most likely draw in an interval around 0cp. This is reflected in the step like behavior for smaller b. The score formula is just a result of taking the win_rate_model, and computing the probability of loss, draw, win. The 'b' parameter describes how smoothly that transition is. For the high quality games in fishtest it is about 30-50, for the quick training data maybe more close to 300?.

see also the plots in official-stockfish/Stockfish#3981

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants