Skip to content

Conversation

@stouset
Copy link
Contributor

@stouset stouset commented Jan 17, 2023

Bit-shifting is a single instruction, and should be faster than an array lookup on supported architectures. Besides (ever so slightly) speeding up the conversion of a square into a bitboard, we may see minor general performance improvements due to preserving more of the CPU's existing cache.

Bench: 4106793

https://tests.stockfishchess.org/tests/view/63c5cfe618c20f4929c5fe46

Bit-shifting is a single instruction, and should be faster than an array lookup
on supported architectures. Besides (ever so slightly) speeding up the
conversion of a square into a bitboard, we may see minor general performance
improvements due to preserving more of the CPU's existing cache.

Bench: 4106793
@BM123499
Copy link
Contributor

I tried this before, and for some reason it failed. I wonder if we changed something in the compiler, or we changed the machines doing tests or if it failed on LTC... 🤔

@niklasf
Copy link
Contributor

niklasf commented Jan 20, 2023

I tried this before, and for some reason it failed. I wonder if we changed something in the compiler, or we changed the machines doing tests or if it failed on LTC... thinking

I seem to remember being surprised about this, too.

@vondele
Copy link
Member

vondele commented Jan 20, 2023

yes, probably depends a bit on the mix of machines running on fishtest?

Locally this is a gain for me:

Result of  20 runs
==================
base (./stockfish.master       ) =    1752135  +/- 10943
test (./stockfish.patch        ) =    1763939  +/- 10818
diff                             =     +11804  +/- 4731

speedup        = +0.0067
P(speedup > 0) =  1.0000

CPU: 16 x AMD Ryzen 9 3950X 16-Core Processor

@stouset
Copy link
Contributor Author

stouset commented Jan 20, 2023

On what type of machine would we expect a 64-bit bitshift to be slower than a load?

If this array were constant at compile-time I might be more inclined to agree that the effect might be variable, because the whole thing could be inlined to a literal value for some cases. But since that's not the case…

@Sopel97
Copy link
Member

Sopel97 commented Jan 20, 2023

as per https://uops.info/table.html, variable length shifts on intels are slower than fixed length (though it's a bit more complicated with the bmi shift variants). On AMDs they are the same.

@UniQP
Copy link
Contributor

UniQP commented Jan 20, 2023

On what type of machine would we expect a 64-bit bitshift to be slower than a load?

There is a case where the shift can be slower: If there are no free registers available. For the shift, you need two input registers, one for the left operand (i.e. the 1) and one for the shift amount. If there are no free registers available, you need to save a register to the stack ("spill") and later load it from the stack again ("reload"). So in worst case, you end up with an additional spill and reload compared to just a load (which only needs one register on x86-64).

That said, I still expect the code to be faster with your changes.

@stouset
Copy link
Contributor Author

stouset commented Jan 20, 2023

Slower than a fixed-length shift sure, but that’s not the alternative here? The original involves a load from a memory offset, which may or may not be in cache (and may or may not end up pushing something else useful out of cache).

That seems like it would almost certainly be slower in virtually every case, I’d expect?

Maybe on 32-bit platforms where a shift on a 64-bit operand requires multiple instructions?

@Sopel97
Copy link
Member

Sopel97 commented Jan 20, 2023

I'm just pointing out that the results will differ, depending on whether it's tested on an AMD or Intel CPU. Overall I also think it's better to remove this lookup table.

@stouset
Copy link
Contributor Author

stouset commented Jan 20, 2023

Aha, a register spill is definitely a case where it could be slower.

@vondele vondele added the to be merged Will be merged shortly label Jan 22, 2023
@vondele vondele closed this in 734315f Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

to be merged Will be merged shortly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants