-
Notifications
You must be signed in to change notification settings - Fork 24
Introduce sfen_format option for gensfen. Experimental support for binpack format in gensfen and learn. #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ack". It determines the output format of the sfens. Binpack is a highly compressed formats for consecutive sfens. Extension is now determined by the used format, output_file_name should contain just the stem.
…tect file extension and choose the correct reader (bin or binpack)
|
Thank you for the pull request. I guess that one usage of this format is to distribute training data between net developers. In that case, we could need to support Could you please take a look at CI? https://travis-ci.org/github/nodchip/Stockfish/builds/725750282 |
|
Yes, I'm slowly trying to get CI working. May have to come back to it tomorrow as it's already late for me. I was talking with noob on discord and he says that shuffling over large distances is not needed to ensure enough randomness (because two games differ on average the same regardless of how far they are). Only shuffling within a batch that's loaded in memory is enough. Persisting the shuffled output is also not needed. Even if some people would still want to shuffle I think .bin for that is the best we can do. |
|
If my memory is correct, someone in Discrod said that we can gain more slos with |
|
It was only a test between completely no shuffle and full shuffle. I don't think anyone did tests on partial shuffles over small blocks. |
|
The following compiles on linux, but I didn't check functionality: diff --git a/src/extra/nnue_data_binpack_format.h b/src/extra/nnue_data_binpack_format.h
index c86a55c2a..e6cd7ad20 100644
--- a/src/extra/nnue_data_binpack_format.h
+++ b/src/extra/nnue_data_binpack_format.h
@@ -41,8 +41,9 @@ THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#include <array>
#include <immintrin.h>
-#ifdef linux
+#ifdef __linux__
#include <x86intrin.h>
+#include <climits>
#else
#include <intrin.h>
#endif
@@ -2700,7 +2701,7 @@ namespace chess
return Bitboard::square(sq0) | sq1;
}
- [[nodiscard]] constexpr Bitboard operator""_bb(std::uint64_t bits)
+ [[nodiscard]] constexpr Bitboard operator""_bb(unsigned long long bits)
{
return Bitboard::fromBits(bits);
}
@@ -7295,4 +7296,4 @@ namespace binpack
std::cout << "Processed " << (cur - base) << " bytes and " << numProcessedPositions << " positions.\n";
}
}
-}
\ No newline at end of file
+} |
|
some comments:
|
|
Okay, checks passed now |
|
nice... btw, I see you make this optional. If it works well, I would consider making this the default & only option. While there is already some data generated, there will be even more generated in the future, and the size of these files is an obstacle to sharing them and having fewer options might be a path to making progress quicker. |
|
Thank you for sending and fixing the pull request. I merged it. |
This PR introduces a new string option for
gensfennamedsfen_format. Valid values arebin(default) andbinpack. This value determines the format to use for serializing the generated sfens. Output file extension is now chosen and added based on the chosen format,output_file_nameshould contain just the sfen. @noobpwnftw asked about this feature, so now there's at least a branch that implements it.For learn the reader type is chosen on single file granularity based on its extension.
.binfiles are read directly,.binpackfiles are read entry by entry and converted to bin format before further processing.nnue_data_binpack_format.h is a header-only library that exposes some functions related to plain, bin, and binpack formats. In particular it allows creating readers and writers for the binpack format - this is utilized in gensfen and learn. It internally defines a type with the same layout as
PackedSfenValuebecause it is meant to be used without dependencies. This can be unified in the future, along with simplification of other aspects in the library.Everything in the library is behind
binpacknamespace. Most of the chess primitives used in it are behindbinpack::chessnamespace. I wouldn't dare trying to port it to stockfish's representation.It also exposes currently unused conversion functions:
convertPlainToBin,convertBinToPlain,convertBinpackToBin,convertBinToBinpack,convertBinpackToPlain,convertPlainToBinpackthat take paths to input and output files and perform the conversion. This covers all possible conversion between plain, bin, and binpack. In the future we can create an interface to allow users to access these through CLI.