[cluster] keep track of node counts cluster-wide. #1896

vondele · 2018-12-24T12:28:47Z

This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before:

Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365
Elo difference: 8.52 +/- 8.43

This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before: Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365 Elo difference: 8.52 +/- 8.43

noobpwnftw · 2018-12-24T16:23:45Z

Can you also fix a typo?

Stockfish/src/cluster.cpp

Line 127 in c0a964e

// Some MPI implementations use busy-wait pooling, while we need yielding

polling

Pondering should be working before(more or less, while some nodes may quit early due to their local stopOnPonderhit), because I only let root node to signal stop globally.

But now the stop signal is a sum of all stop flags, so any node can signal it globally, which I think is also fine. To restore pondering behavior, I think it is better to let the root node decide when to stop, adding is_root() check here:

Stockfish/src/uci.cpp

Line 220 in c0a964e

|| (token == "ponderhit" && Threads.stopOnPonderhit))

seems trivial, you can even cover the whole block here, because the flag will propagate anyways.

vondele · 2018-12-24T16:33:35Z

comment typo fixed. I'll look at pondering a bit more carefully in a separate PR (I agree that this should be handled by the root).

noobpwnftw · 2018-12-24T17:08:51Z

Any node can signal stop to the cluster, which is good for many cases(more accurate limit on movetime, nodes and when mate is found), while behavior is changed when some node has reached maximum depth, or decide to stop a pondering search.
Now it just stops as soon as one node decides to stop the search upon receiving ponderhit, masking out all UCI command handling regarding to stopping the search from non-root nodes should do the trick, while I'm not sure about the other case, when you set limit on depth, should stop decided by mainThread on root or by any mainThread among nodes(since now you are skipping depth on non-root mainThread too)?

vondele · 2018-12-25T08:17:31Z

updated with TB hits counting

vondele · 2018-12-25T16:00:45Z

@noobpwnftw pondering indeed works on the cluster branch as well. I first want to simplify a bit how master deals with this (see #1899) before revisiting it here.

Are you aware of any features (apart from performance) missing on the cluster branch now ?

noobpwnftw · 2018-12-25T17:20:13Z

@vondele I think this is now complete, all limits should work, skill level, multi PV are local features.

vondele · 2018-12-25T17:58:42Z

that's what I'm thinking too. So, once this is merged, I'll merge once more master in this branch (mostly to get the recent bug fixes which appeared in testing).

After that, it is time to see if we can squeeze out more performance, especially in the threaded case (AFAICT, M mpi x N threads is the best way to run it for M nodes with N cores each).

snicolet · 2018-12-27T11:38:07Z

Merged via 9faedfa, thanks! :-)

Fix comment

1e4a41d

[cluster] keep track of TB hits cluster-wide.

9faedfa

snicolet merged commit 9faedfa into official-stockfish:cluster Dec 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cluster] keep track of node counts cluster-wide. #1896

[cluster] keep track of node counts cluster-wide. #1896

Uh oh!

vondele commented Dec 24, 2018

Uh oh!

noobpwnftw commented Dec 24, 2018

Uh oh!

vondele commented Dec 24, 2018

Uh oh!

noobpwnftw commented Dec 24, 2018

Uh oh!

vondele commented Dec 25, 2018

Uh oh!

vondele commented Dec 25, 2018

Uh oh!

noobpwnftw commented Dec 25, 2018

Uh oh!

vondele commented Dec 25, 2018

Uh oh!

snicolet commented Dec 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[cluster] keep track of node counts cluster-wide. #1896

[cluster] keep track of node counts cluster-wide. #1896

Uh oh!

Conversation

vondele commented Dec 24, 2018

Uh oh!

noobpwnftw commented Dec 24, 2018

Uh oh!

vondele commented Dec 24, 2018

Uh oh!

noobpwnftw commented Dec 24, 2018

Uh oh!

vondele commented Dec 25, 2018

Uh oh!

vondele commented Dec 25, 2018

Uh oh!

noobpwnftw commented Dec 25, 2018

Uh oh!

vondele commented Dec 25, 2018

Uh oh!

snicolet commented Dec 27, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants