Skip to content

Conversation

@vondele
Copy link
Member

@vondele vondele commented Dec 24, 2018

This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before:

Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365
Elo difference: 8.52 +/- 8.43

This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before:

Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505  [0.512] 2365
Elo difference: 8.52 +/- 8.43
@noobpwnftw
Copy link
Contributor

Can you also fix a typo?

// Some MPI implementations use busy-wait pooling, while we need yielding

polling

Pondering should be working before(more or less, while some nodes may quit early due to their local stopOnPonderhit), because I only let root node to signal stop globally.

But now the stop signal is a sum of all stop flags, so any node can signal it globally, which I think is also fine. To restore pondering behavior, I think it is better to let the root node decide when to stop, adding is_root() check here:

|| (token == "ponderhit" && Threads.stopOnPonderhit))
seems trivial, you can even cover the whole block here, because the flag will propagate anyways.

@vondele
Copy link
Member Author

vondele commented Dec 24, 2018

comment typo fixed. I'll look at pondering a bit more carefully in a separate PR (I agree that this should be handled by the root).

@noobpwnftw
Copy link
Contributor

Any node can signal stop to the cluster, which is good for many cases(more accurate limit on movetime, nodes and when mate is found), while behavior is changed when some node has reached maximum depth, or decide to stop a pondering search.
Now it just stops as soon as one node decides to stop the search upon receiving ponderhit, masking out all UCI command handling regarding to stopping the search from non-root nodes should do the trick, while I'm not sure about the other case, when you set limit on depth, should stop decided by mainThread on root or by any mainThread among nodes(since now you are skipping depth on non-root mainThread too)?

@vondele
Copy link
Member Author

vondele commented Dec 25, 2018

updated with TB hits counting

@vondele
Copy link
Member Author

vondele commented Dec 25, 2018

@noobpwnftw pondering indeed works on the cluster branch as well. I first want to simplify a bit how master deals with this (see #1899) before revisiting it here.

Are you aware of any features (apart from performance) missing on the cluster branch now ?

@noobpwnftw
Copy link
Contributor

@vondele I think this is now complete, all limits should work, skill level, multi PV are local features.

@vondele
Copy link
Member Author

vondele commented Dec 25, 2018

that's what I'm thinking too. So, once this is merged, I'll merge once more master in this branch (mostly to get the recent bug fixes which appeared in testing).

After that, it is time to see if we can squeeze out more performance, especially in the threaded case (AFAICT, M mpi x N threads is the best way to run it for M nodes with N cores each).

@snicolet snicolet merged commit 9faedfa into official-stockfish:cluster Dec 27, 2018
@snicolet
Copy link
Member

Merged via 9faedfa, thanks! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants