Skip to content

Comments

Generate focused policy using distance to mate#2383

Open
Menkib64 wants to merge 2 commits intoLeelaChessZero:masterfrom
Menkib64:rescorer_mate_policy_focus_improvement
Open

Generate focused policy using distance to mate#2383
Menkib64 wants to merge 2 commits intoLeelaChessZero:masterfrom
Menkib64:rescorer_mate_policy_focus_improvement

Conversation

@Menkib64
Copy link
Contributor

@Menkib64 Menkib64 commented Feb 8, 2026

I'm thinking that policy should be much more focused towards one winning move when rescorer knows distance to mate. This proposal implements rules that moves are ranked by distance to mate and policy preference in training data. The best move candidate with gain kWinningPolicyShareId share of policy. Following moves get kWinningPolicyShareId from the remaining free policy share. The best move gets all of the remaining share when all winning moves have been processed.

This aims to make search more focused towards finding the mate when in a simple endgame like KNBvK.

NOTE: This is only tested using 3 piece tbs.

Copilot AI review requested due to automatic review settings February 8, 2026 12:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts rescoring so that, when tablebases provide distance-to-mate (DTM), policy targets can be reallocated to concentrate probability mass on the quickest-mate winning move(s), making training/search more “mate-focused” in simple TB endgames.

Changes:

  • Adds a new --winning_policy_share option and threads it through the rescoring pipeline.
  • Introduces DTM-based policy target rewriting (using Gaviota probes) to rank winning moves by DTM and assign them a fixed share schedule.
  • Updates internal processing function signatures to carry the new parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

float distTemp, float distOffset, float dtzBoost) {
float distTemp, float distOffset, float dtzBoost,
float winningMovePolicyShare) {
if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f) {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new winningMovePolicyShare parameter is only used in the DTM rewrite path, which is currently gated by dtzBoost != 0.0f further below. As a result, --winning_policy_share becomes a no-op unless dtz boosting is also enabled, which is surprising given the option description. Consider either decoupling the DTM-policy logic from dtzBoost, or explicitly documenting/enforcing that winning_policy_share requires dtz_policy_boost to be non-zero.

Suggested change
if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f) {
if (distTemp == 1.0f && distOffset == 0.0f && dtzBoost == 0.0f &&
winningMovePolicyShare == 0.0f) {

Copilot uses AI. Check for mistakes.
Comment on lines 774 to +778
unsigned int info;
unsigned int dtm;
gaviota_tb_probe_hard(next_pos, info, dtm);
dtms.push_back(dtm);
dtms.emplace_back(
dtm, chunk.probabilities[MoveToNNIndex(move, transform)], move);
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gaviota_tb_probe_hard(next_pos, info, dtm) is used without checking info (e.g., tb_WMATE/tb_BMATE) or probe success. If info indicates a draw/unknown, dtm may be meaningless and will corrupt the move ordering / distribution. Add an info check (and skip/fallback to the existing dtz-boost path when not a mate score).

Copilot uses AI. Check for mistakes.
Comment on lines 769 to 771
unsigned int mininum_dtm = 1000;
// Only safe moves being considered, boost the smallest dtm
// amongst them.
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mininum_dtm is now written (if (dtm < mininum_dtm) ...) but never read after the policy rewrite refactor. This is dead code (and the surrounding comment about boosting the smallest DTM is no longer accurate). Remove the variable/update the comment to match the new distribution logic.

Copilot uses AI. Check for mistakes.
Comment on lines +791 to +800
for (unsigned i = 0; i < std::size(chunk.probabilities); i++) {
auto& prob = chunk.probabilities[i];
if (prob < 0 || std::isnan(prob)) continue;
auto iter = std::find_if(
dtms.begin(), dtms.end(), [i, transform](const MateScore& ms) {
return i == MoveToNNIndex(ms.move, transform);
});
if (iter == dtms.end()) {
prob = 0.0f;
continue;
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewrite of chunk.probabilities does a std::find_if over dtms for every legal NN index, and recomputes MoveToNNIndex(ms.move, transform) repeatedly inside the predicate. This is O(legal_moves * dtms) per position and adds avoidable overhead. Consider precomputing the NN index for each MateScore and building a small lookup (e.g., std::array<float,1858> initialized to 0 / -1) to assign probabilities in one pass without repeated searches.

Copilot uses AI. Check for mistakes.
Comment on lines 1470 to 1474
float distTemp, float distOffset, float dtzBoost,
float winningMovePolicyShare, int newInputFormat) {
FileData data = ProcessFileInternal(std::move(fileContents), tablebase,
distTemp, distOffset, dtzBoost,
winningMovePolicyShare, newInputFormat);
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RescoreTrainingData’s definition now takes float winningMovePolicyShare, but the declaration in src/trainingdata/rescorer.h still has the old signature (no winningMovePolicyShare). This will cause a build failure / ODR mismatch for any TU including the header. Update the header declaration (and any callers) to match, ideally with a default value if you want to keep source compatibility.

Suggested change
float distTemp, float distOffset, float dtzBoost,
float winningMovePolicyShare, int newInputFormat) {
FileData data = ProcessFileInternal(std::move(fileContents), tablebase,
distTemp, distOffset, dtzBoost,
winningMovePolicyShare, newInputFormat);
float distTemp, float distOffset, float dtzBoost, int newInputFormat) {
// Use a default value for winningMovePolicyShare to maintain behavior
// without exposing this parameter in the public API.
constexpr float kDefaultWinningMovePolicyShare = 0.0f;
FileData data = ProcessFileInternal(std::move(fileContents), tablebase,
distTemp, distOffset, dtzBoost,
kDefaultWinningMovePolicyShare,
newInputFormat);

Copilot uses AI. Check for mistakes.
@Menkib64
Copy link
Contributor Author

Menkib64 commented Feb 8, 2026

This aims to address the problem that policy is too flat when the position is completely winning. I don't know how much it is going to help. I don't either know what is a good distribution for policy. Search should be able to reach at least similar level of focus towards a few best moves in endgames. I haven't yet figured out how to do it. I thought that this change might help improve networks before search manages to reach the required level of focus.

These position don't affect playing strength when there is a TB. I'm hoping that we could test it in the current training run to learn how it affects networks. It could be valuable information when trying to adjust search to improve endgame training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant