-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[cluster] keep track of node counts cluster-wide. #1896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before: Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365 Elo difference: 8.52 +/- 8.43
|
Can you also fix a typo? Line 127 in c0a964e
polling Pondering should be working before(more or less, while some nodes may quit early due to their local But now the stop signal is a sum of all stop flags, so any node can signal it globally, which I think is also fine. To restore pondering behavior, I think it is better to let the root node decide when to stop, adding Line 220 in c0a964e
|
|
comment typo fixed. I'll look at pondering a bit more carefully in a separate PR (I agree that this should be handled by the root). |
|
Any node can signal stop to the cluster, which is good for many cases(more accurate limit on |
|
updated with TB hits counting |
|
@noobpwnftw pondering indeed works on the cluster branch as well. I first want to simplify a bit how master deals with this (see #1899) before revisiting it here. Are you aware of any features (apart from performance) missing on the cluster branch now ? |
|
@vondele I think this is now complete, all limits should work, skill level, multi PV are local features. |
|
that's what I'm thinking too. So, once this is merged, I'll merge once more master in this branch (mostly to get the recent bug fixes which appeared in testing). After that, it is time to see if we can squeeze out more performance, especially in the threaded case (AFAICT, M mpi x N threads is the best way to run it for M nodes with N cores each). |
|
Merged via 9faedfa, thanks! :-) |
This generalizes exchange of signals between the ranks using a non-blocking all-reduce. It is now used for the stop signal and the node count, but should be easily generalizable (TB hits, and ponder still missing). It avoids having long-lived outstanding non-blocking collectives (removes an early posted Ibarrier). A bit too short a test, but not worse than before:
Score of new-r4-1t vs old-r4-1t: 459 - 401 - 1505 [0.512] 2365
Elo difference: 8.52 +/- 8.43