Remove precomputed SquareBB #4343

stouset · 2023-01-17T17:56:27Z

Bit-shifting is a single instruction, and should be faster than an array lookup on supported architectures. Besides (ever so slightly) speeding up the conversion of a square into a bitboard, we may see minor general performance improvements due to preserving more of the CPU's existing cache.

Bench: 4106793

https://tests.stockfishchess.org/tests/view/63c5cfe618c20f4929c5fe46

Bit-shifting is a single instruction, and should be faster than an array lookup on supported architectures. Besides (ever so slightly) speeding up the conversion of a square into a bitboard, we may see minor general performance improvements due to preserving more of the CPU's existing cache. Bench: 4106793

BM123499 · 2023-01-20T04:04:04Z

I tried this before, and for some reason it failed. I wonder if we changed something in the compiler, or we changed the machines doing tests or if it failed on LTC... 🤔

niklasf · 2023-01-20T19:56:17Z

I tried this before, and for some reason it failed. I wonder if we changed something in the compiler, or we changed the machines doing tests or if it failed on LTC... thinking

I seem to remember being surprised about this, too.

vondele · 2023-01-20T20:12:56Z

yes, probably depends a bit on the mix of machines running on fishtest?

Locally this is a gain for me:

Result of  20 runs
==================
base (./stockfish.master       ) =    1752135  +/- 10943
test (./stockfish.patch        ) =    1763939  +/- 10818
diff                             =     +11804  +/- 4731

speedup        = +0.0067
P(speedup > 0) =  1.0000

CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor

stouset · 2023-01-20T22:08:01Z

On what type of machine would we expect a 64-bit bitshift to be slower than a load?

If this array were constant at compile-time I might be more inclined to agree that the effect might be variable, because the whole thing could be inlined to a literal value for some cases. But since that's not the case…

Sopel97 · 2023-01-20T22:25:33Z

as per https://uops.info/table.html, variable length shifts on intels are slower than fixed length (though it's a bit more complicated with the bmi shift variants). On AMDs they are the same.

UniQP · 2023-01-20T23:14:03Z

On what type of machine would we expect a 64-bit bitshift to be slower than a load?

There is a case where the shift can be slower: If there are no free registers available. For the shift, you need two input registers, one for the left operand (i.e. the 1) and one for the shift amount. If there are no free registers available, you need to save a register to the stack ("spill") and later load it from the stack again ("reload"). So in worst case, you end up with an additional spill and reload compared to just a load (which only needs one register on x86-64).

That said, I still expect the code to be faster with your changes.

stouset · 2023-01-20T23:14:56Z

Slower than a fixed-length shift sure, but that’s not the alternative here? The original involves a load from a memory offset, which may or may not be in cache (and may or may not end up pushing something else useful out of cache).

That seems like it would almost certainly be slower in virtually every case, I’d expect?

Maybe on 32-bit platforms where a shift on a 64-bit operand requires multiple instructions?

Sopel97 · 2023-01-20T23:24:31Z

I'm just pointing out that the results will differ, depending on whether it's tested on an AMD or Intel CPU. Overall I also think it's better to remove this lookup table.

stouset · 2023-01-20T23:28:25Z

Aha, a register spill is definitely a case where it could be slower.

vondele added the to be merged Will be merged shortly label Jan 22, 2023

vondele closed this in 734315f Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove precomputed SquareBB #4343

Remove precomputed SquareBB #4343

Uh oh!

stouset commented Jan 17, 2023

Uh oh!

BM123499 commented Jan 20, 2023

Uh oh!

niklasf commented Jan 20, 2023

Uh oh!

vondele commented Jan 20, 2023

Uh oh!

stouset commented Jan 20, 2023

Uh oh!

Sopel97 commented Jan 20, 2023

Uh oh!

UniQP commented Jan 20, 2023

Uh oh!

stouset commented Jan 20, 2023 •

edited

Loading

Uh oh!

Sopel97 commented Jan 20, 2023

Uh oh!

stouset commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Remove precomputed SquareBB #4343

Remove precomputed SquareBB #4343

Uh oh!

Conversation

stouset commented Jan 17, 2023

Uh oh!

BM123499 commented Jan 20, 2023

Uh oh!

niklasf commented Jan 20, 2023

Uh oh!

vondele commented Jan 20, 2023

Uh oh!

stouset commented Jan 20, 2023

Uh oh!

Sopel97 commented Jan 20, 2023

Uh oh!

UniQP commented Jan 20, 2023

Uh oh!

stouset commented Jan 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sopel97 commented Jan 20, 2023

Uh oh!

stouset commented Jan 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

stouset commented Jan 20, 2023 •

edited

Loading