-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Also make parents Terminal if any move is a win or all moves are loss or draw. #822
Conversation
The behavior of search with this change:
No direct changes to root behavior to allow searching moves as normal except some of its children will have more accurate Q to direct search. |
Here's the position that inspired me to write up this PR. All runs miss this mate-in-2 clearance / defense annihilation move Qxh7+ (11258 0.26%, 22202 0.31%, 32930 0.27%, 41750 0.29%, 50783 0.17%) where the king is forced to capture the queen Kxh7 and white mates with Rh4#. https://lichess.org/VGNIwD67#70 Looking at the move before the screenshot to try to avoid blundering with Qxa3:
Where with the PR, the checkmate move makes the forced king move a loss which then makes Qxh7+ become a win and the blundering Qxa3 a loss. With Qxa3 appearing as Q: -1, additional visits avoid it and quickly find an alternate h6 move. |
I modified the position/fen of the previous comment as a TB sanity check for a 10-piece position converting moves from my 6-piece TB files:
Where after capturing c4 queen with Rxc4 or Bxc4, eventually all possible black moves are found to be losses because some followup white move is found to be a converted 8-piece TB win. Verbose move stats showing (T) for a 10 piece position, neat! 😄 |
4e92d4d
to
69aff25
Compare
@Ttl One interesting aspect I noticed with nodes becoming terminal losses is the visits remain so if what search had been thinking is the best move leads to the opponent having a terminal win, search needs enough visits for a 2nd-best move to exceed and be picked even though we know for sure the most visited move is bad. Maybe we should pick move to play based on lower confidence bound like leela-zero/leela-zero#2290 ? E.g., from #822 (comment)
It would be unfortunate if e3a3 was picked somewhere between 800 visits and 1590 visits even though we know we shouldn't play it. But even then, the behavior is still no worse than the existing search picking this bad move. Also somewhat related is these terminal loss moves probably shouldn't get any more visits, but we end up "wasting" CPU cycles adjusting the "uncertainty" for these moves. Both of these aren't really specific to this issue though, so probably be their own PRs. |
Nice PR. I like the concept that you conclude that a node is terminal during playout instead of during backup, which is very neat. The code is also non-obtrusive, it has to be said. I think you can even go one step further and also infer some kinds of certain draws. In the logic, just add a Of course, a lot of cases will still fall through the mazes of the net that only a full-fledged solution like #700 can solve, as you already indicated. |
Shhh 😉 I was trying to keep this PR small. Yes, it should just be adding a similar to should_become_draw declaration, accumulation, and conditional termination. Although I wasn't really sure how often all possible moves show up as certain drawn… |
Hi there. I opted against doing it in picking nodes because generally there are only few certain nodes and the code in backpropagatin only adds overhead if and only if the node to process is actually certain (or has any bounds, as i also determine bounds). So it has no overhead for all other (the vast majority) of paths explored in node picking. Also (not currently done in PR487 or PR700) in backprop you can cease to do any checks if there has been no change in certainty state when going up a level. That being said, you could easily move your code to backprop with almost no changes. Your code is node-centric which makes it fit well with current structure of backprop and picknode. I also contemplated that but decided against doing it. The main reasons are in how i envision the future of search: While this probably was not the best decision in terms of "short-term" mergability, i still believe that it will be unavoidable in the long-term if we ever want to handle transpositions and/or lookahead search efficiently. If you want to retain as much of the functionality of PR487 or PR700 as possibly. You could just move the certainty status to the node structure. Then the code changes collapse/simplify greatly in the getters, the only thing you have to ditch is the look-ahead or PR487. Some other remarks. You may have a problem if a terminal node gets to be root if opponent plays into mate. So you might have to unterminal root when initializing search. Also if you want to retain a previously calculated mating sequence, you need to change the trim at head behaviour, as currently you trim if the head is terminal, basically throwing away the search result leading to the terminal condition. Then you also could check for certain draws and bounds (this is very effective for discovering draws). |
No need, you can do what PR700 does and adjust the tuple that is used for GetBestChildrenNoTemperature. That solves this issue.
This can only be done safely at root. As terminal losses need to be hit (even repeatedly) to inform search about the inferiority of the path. If policy guided UCT hits these nodes depite having a Q of -1 the line must be really bad and these hits help to start exploring alternatives early. At root you can safely skip these as there is no information left. I tried various schemes (in the tree, not at root), not visiting loosing childs at all, decreasing visit probability with N, decreasing visit probability with depth. The result was that not visiting at all looses elo, results of the more "refined" schemes (N, depth) were inconclusive. So i would strongly advise to only skip picking loosing moves at root and let PUCT do its thing in tree. |
@Mardak Something I toyed with long ago was, if I decided a child needed pruned (in this case with Chess, a terminal node that is a loss) I'd remove the search and eval data from that move all the way back up the tree to overcome the "averaging" problem of MCTS. This was separate from using max_lcb to choose moves, the idea being instead of needing the 2nd move to get enough visits to outnumber the 1st move, I'd lower the # of visits to the 1st move if I pruned a big branch in the tree. (Lowering the average win rate much faster, and promoting 2nd move faster than waiting it out.) Just food for thought if you ever want to try it. |
Hi, I solved this issue by doing this: Line 1429 in d6cfd84
|
Thanks for the suggestions. Terminating parents from
The downside is that it's slightly more complex requiring a new children loop in each step of backup, but the extra loop should only trigger when a parent has a new terminal child. This conditional behavior aspect would allow backup to do other more complex changes without impacting the forward common case. |
Exactly. Even if the original node is terminal you do not need to check all childs on every depth. As soon as you get to a node that does not get to be made terminal you can stop checking all childs from that node onward until backprop is finished. Thats the most optimal way and also avoids some other issues with doing it in picknode. If you move it to backprop and also make nodes a terminal draw if all childs are terminal draws then you have the simplest version i can think of. And if you succeed in getting that merged, we can then add more features over time. Just for reference here is my findings in how much elo each part of PR487 gains: |
TC 15.12s/game + 0.063s/move (~4000nodes/move)
|
Added a commit to do Terminator from
Instead of taking 6 visits to identify both moves as terminal, the new behavior finds it 2 visits earlier (1 visit for each terminal). Also, the new commit adds in draw checks. Here's a dummy test position:
|
@Ttl Oh nice! I didn't notice your #817 earlier. Here's the output with PRs combined:
Looks like your PR definitely helps avoid picking known losing moves! 😄 |
99e7927
to
bccfd10
Compare
I checked the code (briefly) and everything looks ok now. I really think you cannot do a simpler version of this. So lets keep our fingers crossed that this gets commited, so we have a starting point to maybe build on top of this. Your stats match mine, and are the reason why i push for two-fold draw scoring. Certain draws will then increase substantially in endgame positions (and the lookahead will increase those even further at no gpu cost - just cpu). Last two things I would suggest is to call it "certainty-propagation" or maybe even more accurately "terminal-propagation". An also you should maybe not trim the head completely - because currently as soon as a terminal becomes root (start of the mating sequence) all previous computation is lost because the tree gets deleted. This might lead to strange sudden death behavior. You could just change the call to trimathead to only occur if the terminal node has no children and unterminal root. |
There's a set of issues related to trimming the head or not that would complicate this PR, so not sure if it's worth delaying getting things merged vs cleaning it up in its own PR. Similarly naming discussions often lead to time wasted bikeshedding, so I usually just let the reviewer decide if we even need to have a option/param. |
76092ac
to
7cfa496
Compare
Probably a stupid question: What are the functional differences between this and PR800 now? |
f23b23d
to
0a2b234
Compare
@Mardak do you know why this test showed an Elo regression? Is the code slower? Or was there a problem that has been fixed since then? |
The old code was a bit inefficient in the common case affecting all node picking vs the current code only doing extra work when a new terminal was visited. There is also a fix in the current code to retain existing tree if the game played into a "terminal" position where the old code would have required re-searching already visited nodes. |
// A non-winning terminal move needs all other moves to have the same value. | ||
if (can_convert && v != 1.0f) { | ||
for (const auto& edge : p->Edges()) { | ||
can_convert = can_convert && edge.IsTerminal() && edge.GetQ(0.0f) == v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic can be extended so that if all nodes are terminal, but they don't all agree on v - then parent can be made terminal draw. But theoretically assuming imperfect opponent play maybe it could be better not to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if all childs are terminal (or certain in my parlance) the maximum should be returned, so if all moves are (certain) losses and one is a (certain) draw, we can safely consider the position a certain draw. The perfect play assumption is not really relevant, as all opponents of leela either share the same codebase (different NNs) or are much superior in propagating minimax scores (basically all a/b engines). I would suggest to implement this - its simple and should maybe only add one line of code, but this could be done in further PRs after this is in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be extended in a separate PR to improve the logic.
I still find the name of the option a little misleading, as mates in the opening or middle gamephase are also affected. I would still suggest calling it "terminal-propagation". Also in its current simple form it can be an always on feature so maybe we don't need an option after all. |
I saw that you avoid trimming the head now and recalculate N. But i vaguely remember that I also had to change makemove, as otherwise there are problems in self-play and treesharing - did you test that? |
It does look like the root can end up terminal if selfplay is run with |
Things should not break if people use different options. The change required to fix this is basically identical to what you already did in Resettoposition, you just have to add the same lines of code to makemove. You can check PR700 for reference. |
But how does it break? The outcome of the games are the same. |
Copy/pasting code is not a good solution in general as it often leads to bugs and code maintenance issues down the line that need additional review cycles or mental overhead causing. I can file a followup PR with a better fix than just duplicating code. |
There might be several issues, games might not be played to their conclusion according to the rules of chess, terminating early before mate or a real drawn terminal is actually on the board and i think there might even be some edge cases where it can crash (but this might or might not apply to your current implementation). |
Obviously it was not meant as suggesting to cut/paste the code verbatim. It was just meant as a pointer on what could to be done in makemove to avoid potential problems - naturally it needs to be adapted. The aim was to raise awareness of potential issues - so we can get this into master without breaking anything. |
PR800 is based on PR700 (basic certainty propagation). The differences to this PR are:
But all of this is inconsequential, as many of the strength gaining features can be added to this PR later once its in master over time. You can think of PR487/700/800 as a proof-of-concept of what is possible if you push this concept to its limits. In doing that i imposed - maybe to much - of my vision on how the codebase should evolve, and generally underestimated that the collaborative approach is better served with implementing features in small increments.And this is what this PR is about. |
I wondered whether MakeNotTerminal should be kept in case MakeMove isn't called - but I guess its fine since the making of terminals never applies to the root - so even if we start from fen which is late game and able to be proven terminal, the root itself doesn't get marked and so does not need to be reset. |
src/mcts/node.cc
Outdated
const auto n = child.GetN(); | ||
if (n > 0) { | ||
n_ += n; | ||
q_ -= child.GetQ(0.0f) * n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add a comment why it is -=
here instead of +=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would += -child.GetQ()
instead be more immediately clear without a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to +=
and added comment:
Lines 244 to 245 in 9b76ded
// Flip Q for opponent. | |
q_ += -child.GetQ(0.0f) * n; |
b9b5e2f
to
9b76ded
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
r?@mooskagh Not as feature-full as #700, but this behaves more like TB terminals. Convert a node when search notices children have terminal behavior so future search can stop sooner. Slight refactoring of NodeToProcess to as TerminalHit behaved the same as Extension(1).
Example:
Then looking at one move before: