-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix processor group binding under Windows. #3787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
No functional change.
|
I can confirm locally that this is required to spread threads on all cores on Windows 11 with a 3990X CPU. It has only 1 NUMA node with 2 groups of 64 threads each. Other OS/CPU combinations are not yet tested, hopefully nothing breaks. |
|
thanks, maybe somebody with an older version of windows and a 64+ threads machine could give this a test as well. |
|
Behavior change of the API is documented here: |
who would have guessed |
|
There hasn't been any negative feedback, so I'll proceed with merging. We'll see if it causes problems with older windows OS on large core hardware. |
|
I think that this code doesnt work right way. I have system 2xXeon E5 2697 v2 (24 cores, 48 treads, Windows 10 21H2, x64 ) and since this version, when i set number of cores to 24, then Stockfish use 1CPU to 100% and second CPU is idle. Previous versions works right way, 12 cores running for both CPU's. |
|
Thanks for the feedback... I think we'll need to understand if this windows version is from before of after |
As per 14.1 release(before the patch): Lines 551 to 555 in 7262fd5
If in your system, "12 cores running for both CPU's" is the wrong behavior and now "use 1CPU to 100% and second CPU is idle" works as intended. |
|
@g3g6 Can you run this bench with Stockfish 14.1 and also with last version to see the difference : http://ipmanchess.yolasite.com/amd--intel-chess-bench-stockfish.php Thanks! |
Win 10 x64, 21H2, build 19044.1348 |
Benchmark
|
|
#3798 will restore old NUMA assignment behavior. |
revert official-stockfish@9048ac0 due to core spread problem and fix new OS compatibility with another method. This code assumes that if one NUMA node has more than one processor groups, they are created equal(having equal amount of cores assigned to each of the groups), and also the total number of available cores contained in such groups are equal to the number of available cores within one NUMA node because of how best_node function works. closes official-stockfish#3798 fixes official-stockfish#3787 No functional change.
Old code only worked because there was probably a limit on how many cores/threads can reside within one NUMA node, and the OS creates extra NUMA nodes when necessary, however the actual mechanism of core binding is done by "Processor Groups"(https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups). With a newer OS, one NUMA node can have many such "Processor Groups" and we should just consistently use the number of groups to bind the threads instead of deriving the topology from the number of NUMA nodes.
No functional change.