Skip to content

Conversation

@FauziAkram
Copy link
Contributor

Passed STC (NNUE=False): https://tests.stockfishchess.org/tests/view/5f42edfe5089a564a10d84a0

LLR: 2.96 (-2.94,2.94) {-0.25,1.25}
Total: 13840 W: 2591 L: 2336 D: 8913
Ptnml(0-2): 194, 1453, 3387, 1676, 210

Passed LTC (NNUE=False): https://tests.stockfishchess.org/tests/view/5f4369795089a564a10d84d8

LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 159744 W: 19430 L: 18850 D: 121464
Ptnml(0-2): 960, 14185, 49030, 14709, 988

@vondele vondele added the to be merged Will be merged shortly label Aug 30, 2020
@vondele vondele closed this in e0bafa1 Aug 30, 2020
@vondele
Copy link
Member

vondele commented Aug 30, 2020

thanks!

BTW, for pure evaluation patches, testing with Use NNUE=false is correct, and there is no need to do non-regression tests with Use NNUE=true.

@FauziAkram
Copy link
Contributor Author

FauziAkram commented Aug 30, 2020

@vondele I understand, but imagine with me this imaginary scenario:
Imagine an evaluation patch that makes stockfish gain let's say 3 ELO at middle game, and losing 1 ELO at endgame
This patch will pass the NNUE=false tests, but when you merge it to the main stockfish, this will harm it, since main stockfish uses the classical evaulation only in endgame (or when material is low) right?
That is why I like to make sure that my patch is good enough to pass the NNUE=false, and I still like to make sure that it doesn't lose anything in the NNUE=true.
What is your thought on this?

@vondele
Copy link
Member

vondele commented Aug 30, 2020

I understand the scenario you sketch. However, I want to keep development of classical eval as straightforward as possible, and keep it optimized for classical evaluation.

Furthermore, the impact of classical on use NNUE is small, via hybrid (now mostly for imbalanced psq score, not just endgame or midgame), and most patches to eval will be essentially inconsequential. Yet, our testing will still reject 20% of 0 Elo patches, so we would quite commonly reject a good Elo gainer, just because the impact on use NNUE is 0.

@FauziAkram
Copy link
Contributor Author

I understand and agree with your vision

lucabrivio pushed a commit to lucabrivio/Stockfish that referenced this pull request Aug 30, 2020
Passed STC (NNUE=False):
https://tests.stockfishchess.org/tests/view/5f42edfe5089a564a10d84a0
LLR: 2.96 (-2.94,2.94) {-0.25,1.25}
Total: 13840 W: 2591 L: 2336 D: 8913
Ptnml(0-2): 194, 1453, 3387, 1676, 210

Passed LTC (NNUE=False):
https://tests.stockfishchess.org/tests/view/5f4369795089a564a10d84d8
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 159744 W: 19430 L: 18850 D: 121464
Ptnml(0-2): 960, 14185, 49030, 14709, 988

closes official-stockfish/Stockfish#3080

bench: 3736029
Dantist pushed a commit to Dantist/Stockfish that referenced this pull request Dec 22, 2020
Passed STC (NNUE=False):
https://tests.stockfishchess.org/tests/view/5f42edfe5089a564a10d84a0
LLR: 2.96 (-2.94,2.94) {-0.25,1.25}
Total: 13840 W: 2591 L: 2336 D: 8913
Ptnml(0-2): 194, 1453, 3387, 1676, 210

Passed LTC (NNUE=False):
https://tests.stockfishchess.org/tests/view/5f4369795089a564a10d84d8
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 159744 W: 19430 L: 18850 D: 121464
Ptnml(0-2): 960, 14185, 49030, 14709, 988

closes official-stockfish#3080

bench: 3736029
BM123499 pushed a commit to BM123499/Stockfish that referenced this pull request Feb 22, 2021
Passed STC (NNUE=False):
https://tests.stockfishchess.org/tests/view/5f42edfe5089a564a10d84a0
LLR: 2.96 (-2.94,2.94) {-0.25,1.25}
Total: 13840 W: 2591 L: 2336 D: 8913
Ptnml(0-2): 194, 1453, 3387, 1676, 210

Passed LTC (NNUE=False):
https://tests.stockfishchess.org/tests/view/5f4369795089a564a10d84d8
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 159744 W: 19430 L: 18850 D: 121464
Ptnml(0-2): 960, 14185, 49030, 14709, 988

closes official-stockfish#3080

bench: 3736029
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

to be merged Will be merged shortly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants