Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data #4324

linrock · 2023-01-06T14:08:08Z

This is a simplified variant of the depth6 multipv2 data filtering method used to prepare the Leela-dfrc_n5000.binpack part of the dataset that trained recent master nets (#4295, #4314, #4319). Uses multipv search heuristics to skip positions in binpacks during nnue training while maintaining binpack compression.

Example invocation:

transform filter_335a9b2d8a80 input_file in.binpack output_file out.binpack

Filters binpack data to skip all positions where:

either bestmove is a capture or promo at depth6 multipv2 search
the given move in the position is a capture or promo
the fen is a standard chess starting position

Sopel97 · 2023-01-06T14:27:15Z

Thanks. Would it be feasible to regenerate the current best datasets with the score alteration way of removing the positions? If not then I could try doing it by comparing the filtered and unfiltered datasets, if available.

Just a note, personally I'd like to name the filtering function less generically, for example by appending the hash of the first master network trained with the filtered data, this way we would have a connection established and it would be harder to make a mistake in the future. Not sure what others think.

linrock · 2023-01-11T17:17:48Z

Would it be feasible to regenerate the current best datasets with the score alteration way of removing the positions?

It would be difficult to exactly regenerate the best datasets with score alteration instead of position removal, since the current best datasets were generated imprecisely - it's a rough mix of filtered data pieces mixed together.

It'd be easier to generate a different version of the best datasets with score alteration instead of removal. However, I'm guessing this would be slightly weaker than the best datasets, all else being equal, since maximizing binpack compression has a tradeoff - binpack fragment granularity would be at the per-training-game level vs. many fragments per training game with position removal. Score alteration improves compression at the cost of less randomness after shuffling. Needs some testing. It may also take a few weeks to regenerate.

Just a note, personally I'd like to name the filtering function less generically, for example by appending the hash of the first master network trained with the filtered data, this way we would have a connection established and it would be harder to make a mistake in the future. Not sure what others think.

Is the intent to document an exact way to filter the data for the first recent master network? If so, the difference between the exact filtering method used and this PR is:

startpos was removed from all filtered games
position removal instead of score alteration. trades off binpack compression for more randomness
hardcoded to depth6 multipv2

Sopel97 · 2023-01-11T17:24:53Z

Is the intent

The intent is to have versioning for filtering functions in a way that prevents misuse.

vondele · 2023-01-22T09:52:41Z

The PR is still marked as draft. Are there plans to further develop it?

linrock · 2023-01-23T01:08:13Z

The PR is still marked as draft. Are there plans to further develop it?

Yea, we're close to finalizing how this should work. I'll get the PR into a reviewable state after some clarifications.

The intent is to have versioning for filtering functions in a way that prevents misuse.

Makes sense, how about something like this?

transform filter-335a9b2d8a80 input_file in.binpack output_file out.binpack

uses score alteration by default to maximize binpack compression
filters out all startpos and positions where either move is a capture at a hardcoded depth6 mulitipv2 search

transform filter-335a9b2d8a80 alter_score 0 input_file in.binpack output_file out.binpack

enables removing positions entirely, to replicate the exact dataset preparation method used for training nn-335a9b2d8a80.nnue. with the default random-fen-skipping 3 value, this may have slightly higher elo due to trading some compression for more shuffling randomness (unconfirmed).

Sopel97 · 2023-01-26T01:23:59Z

Sounds good.

…0.nnue Append hash of first master net trained with filter method Hardcode depth 6 and remove option to set depth Underscores for consistency Filter out standard startpos positions too

linrock · 2023-02-04T07:54:12Z

Ready for review. I realized if more binpack shuffling randomness is desired, it's easy to remove positions with score VALUE_NONE after filtering. The computationally expensive part of filtering is during search. As such, I don't think there's a strong need for an alter_score 0 option to remove positions during filtering. This way binpacks will stay compressed by default.

I'll plan to re-filter the training data later to maintain binpack compression but it may take a while to complete.

Sopel97 · 2023-02-05T12:51:00Z

Looks good to me.

Tools transform option for filtering data for training nn-335a9b2d8a8…

46d85de

…0.nnue Append hash of first master net trained with filter method Hardcode depth 6 and remove option to set depth Underscores for consistency Filter out standard startpos positions too

linrock force-pushed the tools-filter-binpacks branch from 61b24c9 to 46d85de Compare February 4, 2023 07:35

linrock changed the title ~~[WIP] Tools: transform option to filter positions out of binpacks with search~~ [WIP] Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data Feb 4, 2023

linrock changed the title ~~[WIP] Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data~~ Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data Feb 4, 2023

linrock marked this pull request as ready for review February 4, 2023 07:48

vondele merged commit 8e16592 into official-stockfish:tools Feb 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data #4324

Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data #4324

Uh oh!

linrock commented Jan 6, 2023 •

edited

Loading

Uh oh!

Sopel97 commented Jan 6, 2023

Uh oh!

linrock commented Jan 11, 2023

Uh oh!

Sopel97 commented Jan 11, 2023 •

edited

Loading

Uh oh!

vondele commented Jan 22, 2023

Uh oh!

linrock commented Jan 23, 2023

Uh oh!

Sopel97 commented Jan 26, 2023

Uh oh!

linrock commented Feb 4, 2023

Uh oh!

Sopel97 commented Feb 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data #4324

Tools: depth6 multipv2 filter option for preparing nn-335a9b2d8a80.nnue training data #4324

Uh oh!

Conversation

linrock commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sopel97 commented Jan 6, 2023

Uh oh!

linrock commented Jan 11, 2023

Uh oh!

Sopel97 commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vondele commented Jan 22, 2023

Uh oh!

linrock commented Jan 23, 2023

Uh oh!

Sopel97 commented Jan 26, 2023

Uh oh!

linrock commented Feb 4, 2023

Uh oh!

Sopel97 commented Feb 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linrock commented Jan 6, 2023 •

edited

Loading

Sopel97 commented Jan 11, 2023 •

edited

Loading