Generate focused policy using distance to mate by Menkib64 · Pull Request #2383 · LeelaChessZero/lc0

Menkib64 · 2026-02-08T12:29:18Z

I'm thinking that policy should be much more focused towards one winning move when rescorer knows distance to mate. This proposal implements rules that moves are ranked by distance to mate and policy preference in training data. The best move candidate with gain kWinningPolicyShareId share of policy. Following moves get kWinningPolicyShareId from the remaining free policy share. The best move gets all of the remaining share when all winning moves have been processed.

This aims to make search more focused towards finding the mate when in a simple endgame like KNBvK.

NOTE: This is only tested using 3 piece tbs.

Copilot

Pull request overview

This PR adjusts rescoring so that, when tablebases provide distance-to-mate (DTM), policy targets can be reallocated to concentrate probability mass on the quickest-mate winning move(s), making training/search more “mate-focused” in simple TB endgames.

Changes:

Adds a new --winning_policy_share option and threads it through the rescoring pipeline.
Introduces DTM-based policy target rewriting (using Gaviota probes) to rank winning moves by DTM and assign them a fixed share schedule.
Updates internal processing function signatures to carry the new parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-08T12:33:14Z

src/trainingdata/rescorer.cc

-                            float distTemp, float distOffset, float dtzBoost) {
+                            float distTemp, float distOffset, float dtzBoost,
+                            float winningMovePolicyShare) {
  if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f) {


The new winningMovePolicyShare parameter is only used in the DTM rewrite path, which is currently gated by dtzBoost != 0.0f further below. As a result, --winning_policy_share becomes a no-op unless dtz boosting is also enabled, which is surprising given the option description. Consider either decoupling the DTM-policy logic from dtzBoost, or explicitly documenting/enforcing that winning_policy_share requires dtz_policy_boost to be non-zero.

Suggested change

if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f) {

if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f &&

winningMovePolicyShare == 0.0f) {

Copilot · 2026-02-08T12:33:14Z

src/trainingdata/rescorer.cc

          unsigned int info;
          unsigned int dtm;
          gaviota_tb_probe_hard(next_pos, info, dtm);
-          dtms.push_back(dtm);
+          dtms.emplace_back(
+              dtm, chunk.probabilities[MoveToNNIndex(move, transform)], move);


gaviota_tb_probe_hard(next_pos, info, dtm) is used without checking info (e.g., tb_WMATE/tb_BMATE) or probe success. If info indicates a draw/unknown, dtm may be meaningless and will corrupt the move ordering / distribution. Add an info check (and skip/fallback to the existing dtz-boost path when not a mate score).

Copilot · 2026-02-08T12:33:15Z

src/trainingdata/rescorer.cc

        unsigned int mininum_dtm = 1000;
        // Only safe moves being considered, boost the smallest dtm
        // amongst them.


mininum_dtm is now written (if (dtm < mininum_dtm) ...) but never read after the policy rewrite refactor. This is dead code (and the surrounding comment about boosting the smallest DTM is no longer accurate). Remove the variable/update the comment to match the new distribution logic.

Copilot · 2026-02-08T12:33:15Z

src/trainingdata/rescorer.cc

+        for (unsigned i = 0; i < std::size(chunk.probabilities); i++) {
+          auto& prob = chunk.probabilities[i];
+          if (prob < 0 || std::isnan(prob)) continue;
+          auto iter = std::find_if(
+              dtms.begin(), dtms.end(), [i, transform](const MateScore& ms) {
+                return i == MoveToNNIndex(ms.move, transform);
+              });
+          if (iter == dtms.end()) {
+            prob = 0.0f;
+            continue;


The rewrite of chunk.probabilities does a std::find_if over dtms for every legal NN index, and recomputes MoveToNNIndex(ms.move, transform) repeatedly inside the predicate. This is O(legal_moves * dtms) per position and adds avoidable overhead. Consider precomputing the NN index for each MateScore and building a small lookup (e.g., std::array<float,1858> initialized to 0 / -1) to assign probabilities in one pass without repeated searches.

Copilot · 2026-02-08T12:33:15Z

src/trainingdata/rescorer.cc

+    float distTemp, float distOffset, float dtzBoost,
+    float winningMovePolicyShare, int newInputFormat) {
+  FileData data = ProcessFileInternal(std::move(fileContents), tablebase,
+                                      distTemp, distOffset, dtzBoost,
+                                      winningMovePolicyShare, newInputFormat);


RescoreTrainingData’s definition now takes float winningMovePolicyShare, but the declaration in src/trainingdata/rescorer.h still has the old signature (no winningMovePolicyShare). This will cause a build failure / ODR mismatch for any TU including the header. Update the header declaration (and any callers) to match, ideally with a default value if you want to keep source compatibility.

Suggested change

float distTemp, float distOffset, float dtzBoost,

float winningMovePolicyShare, int newInputFormat) {

FileData data = ProcessFileInternal(std::move(fileContents), tablebase,

distTemp, distOffset, dtzBoost,

winningMovePolicyShare, newInputFormat);

float distTemp, float distOffset, float dtzBoost, int newInputFormat) {

// Use a default value for winningMovePolicyShare to maintain behavior

// without exposing this parameter in the public API.

constexpr float kDefaultWinningMovePolicyShare = 0.0f;

FileData data = ProcessFileInternal(std::move(fileContents), tablebase,

distTemp, distOffset, dtzBoost,

kDefaultWinningMovePolicyShare,

newInputFormat);

Menkib64 · 2026-02-08T13:46:14Z

This aims to address the problem that policy is too flat when the position is completely winning. I don't know how much it is going to help. I don't either know what is a good distribution for policy. Search should be able to reach at least similar level of focus towards a few best moves in endgames. I haven't yet figured out how to do it. I thought that this change might help improve networks before search manages to reach the required level of focus.

These position don't affect playing strength when there is a TB. I'm hoping that we could test it in the current training run to learn how it affects networks. It could be valuable information when trying to adjust search to improve endgame training.

Generate focused policy using distance to mate

9ab7273

Copilot AI review requested due to automatic review settings February 8, 2026 12:29

Copilot started reviewing on behalf of Menkib64 February 8, 2026 12:29 View session

Copilot AI reviewed Feb 8, 2026

View reviewed changes

Fix RescoreTrainingData signature

59e5078

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Generate focused policy using distance to mate#2383

Generate focused policy using distance to mate#2383
Menkib64 wants to merge 2 commits intoLeelaChessZero:masterfrom
Menkib64:rescorer_mate_policy_focus_improvement

Menkib64 commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Copilot AI Feb 8, 2026

Uh oh!

Menkib64 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f) {
	if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f &&
	winningMovePolicyShare == 0.0f) {

Comments

Conversation

Menkib64 commented Feb 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Menkib64 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant