Also make parents Terminal if any move is a win or all moves are loss or draw. #822

Mardak · 2019-03-29T21:39:03Z

r?@mooskagh Not as feature-full as #700, but this behaves more like TB terminals. Convert a node when search notices children have terminal behavior so future search can stop sooner. Slight refactoring of NodeToProcess to as TerminalHit behaved the same as Extension(1).

Example:

./lc0 -w 11258 --verbose-move-stats --fpu-strategy-at-root=absolute --minibatch-size=1 --smart-pruning-factor=0
position fen 3k4/p4Q2/3N3p/1p1B4/8/1P2P3/2R2PPP/6K1 w - - 1 31 moves f7a7

# just loading the position
go nodes 1    
h6h5 N:       0 (+ 0) (P: 47.68%) (Q:  1.00000) (V:  -.----)
b5b4 N:       0 (+ 0) (P: 52.32%) (Q:  1.00000) (V:  -.----)

# root fpu=1 visit both; nothing special
go nodes 2
h6h5 N:       1 (+ 0) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       1 (+ 0) (P: 52.32%) (Q: -0.97261) (V: -0.9726)

# b5b4 finds c2c8#
go nodes 3
h6h5 N:       1 (+ 0) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       2 (+ 0) (P: 52.32%) (Q: -0.98630) (V: -0.9726)

# b5b4 c2c8# converts b5b4 into a loss
go nodes 5
h6h5 N:       2 (+ 0) (P: 47.68%) (Q: -0.98746) (V: -0.9749)
b5b4 N:       3 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

# h6h5 c2c8# converts h6h5 into a loss
go nodes 7
h6h5 N:       3 (+ 0) (P: 47.68%) (Q: -1.00000) (V: -1.0000) (T)
b5b4 N:       4 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

Then looking at one move before:

position fen 3k4/p4Q2/3N3p/1p1B4/8/1P2P3/2R2PPP/6K1 w - - 1 31

# just normal out of order finding checkmate moves
go nodes 10
c2c6 N:       0 (+ 0) (P:  1.94%) (Q:  1.00000) (V:  -.----)
f7c7 N:       0 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  -.----)
d6c8 N:       0 (+ 0) (P:  1.98%) (Q:  1.00000) (V:  -.----)
d5e4 N:       0 (+ 0) (P:  2.06%) (Q:  1.00000) (V:  -.----)
d5b7 N:       1 (+ 0) (P:  2.21%) (Q:  0.97211) (V:  0.9721)
d5c6 N:       1 (+ 0) (P:  2.16%) (Q:  0.97785) (V:  0.9778)
c2c7 N:       1 (+ 0) (P:  2.24%) (Q:  0.97764) (V:  0.9776)
d6b7 N:       1 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:       2 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:       3 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

# just normal search visits
go nodes 400
d6c8 N:       8 (+ 0) (P:  1.98%) (Q:  0.98583) (V:  0.9752)
d5e4 N:       8 (+ 0) (P:  2.06%) (Q:  0.98539) (V:  0.9736)
f7c7 N:       9 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  1.0000) (T)
d5b7 N:       9 (+ 0) (P:  2.21%) (Q:  0.98568) (V:  0.9721)
d5c6 N:       9 (+ 0) (P:  2.16%) (Q:  0.99107) (V:  0.9778)
c2c7 N:       9 (+ 0) (P:  2.24%) (Q:  0.98915) (V:  0.9776)
d5e6 N:      11 (+ 0) (P:  2.53%) (Q:  0.99073) (V:  0.9777)
d6b7 N:      18 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:      20 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:      37 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

# f7a7 (from first example) has enough visits that all its moves are losses, so convert to win
go nodes 500
f7a7 N:       9 (+ 0) (P:  1.52%) (Q:  1.00000) (V:  1.0000) (T)
f7f6 N:      10 (+ 0) (P:  1.74%) (Q:  1.00000) (V:  1.0000) (T)
f7c7 N:      11 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  1.0000) (T)
d5b7 N:      11 (+ 0) (P:  2.21%) (Q:  0.98829) (V:  0.9721)
d5c6 N:      11 (+ 0) (P:  2.16%) (Q:  0.99269) (V:  0.9778)
c2c7 N:      12 (+ 0) (P:  2.24%) (Q:  0.99186) (V:  0.9776)
d5e6 N:      13 (+ 0) (P:  2.53%) (Q:  0.99216) (V:  0.9777)
d6b7 N:      23 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:      24 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:      46 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

# lots of moves converting to win
go nodes 600
f7a7 N:      10 (+ 0) (P:  1.52%) (Q:  1.00000) (V:  1.0000) (T)
f7f6 N:      12 (+ 0) (P:  1.74%) (Q:  1.00000) (V:  1.0000) (T)
b3b4 N:      13 (+ 0) (P:  1.87%) (Q:  1.00000) (V:  1.0000) (T)
f7c7 N:      13 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  1.0000) (T)
d5c6 N:      15 (+ 0) (P:  2.16%) (Q:  1.00000) (V:  1.0000) (T)
d5b7 N:      15 (+ 0) (P:  2.21%) (Q:  1.00000) (V:  1.0000) (T)
c2c7 N:      15 (+ 0) (P:  2.24%) (Q:  1.00000) (V:  1.0000) (T)
d5e6 N:      17 (+ 0) (P:  2.53%) (Q:  1.00000) (V:  1.0000) (T)
d6b7 N:      27 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:      28 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:      54 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

Mardak · 2019-03-29T21:52:43Z

The behavior of search with this change:

convert to loss when any child is checkmate prevents visiting other children, so this saves all future visits to these children
nodes converted to loss are less likely to be visited by search favoring Q>-1 moves
convert to win when all children are checkmated, so this node is more likely to be visited with Q=1 instead of other moves that might require NN lookup

No direct changes to root behavior to allow searching moves as normal except some of its children will have more accurate Q to direct search.

Mardak · 2019-03-29T22:11:09Z

Here's the position that inspired me to write up this PR.

All runs miss this mate-in-2 clearance / defense annihilation move Qxh7+ (11258 0.26%, 22202 0.31%, 32930 0.27%, 41750 0.29%, 50783 0.17%) where the king is forced to capture the queen Kxh7 and white mates with Rh4#. https://lichess.org/VGNIwD67#70

Looking at the move before the screenshot to try to avoid blundering with Qxa3:

./lc0 -w 11258 --verbose-move-stats --smart-pruning-factor=0
position startpos moves g1f3 e7e6 c2c4 g8f6 d2d4 f8b4 c1d2 b4e7 a2a3 d7d5 e2e3 e8g8 b1c3 b8d7 d1c2 b7b6 c4d5 e6d5 f1e2 c8b7 f3e5 a7a6 e1g1 c7c5 d2e1 a8c8 c2f5 d8c7 f2f4 c8d8 a1c1 f8e8 e1h4 b6b5 e2d3 d7f8 f5h3 f6e4 c3e4 d5e4 d3b1 c5c4 f4f5 e7h4 h3h4 f7f6 e5g4 f8d7 g4f2 d7b6 f2e4 b7e4 b1e4 e8e7 c1e1 d8e8 f1f4 g8h8 e4f3 c4c3 b2c3 c7c3 h2h3 e7e3 e1e3 c3e3 g1h2 b6c4 f3d5

# without PR
go nodes 800
c4d2 N:      47 (+ 0) (P: 13.63%) (Q:  0.00268) (V:  0.0637)
e3a3 N:     580 (+ 0) (P: 14.29%) (Q:  0.23623) (V:  0.3569)

go nodes 1600
h7h6 N:     483 (+ 1) (P:  3.92%) (Q:  0.18075) (V:  0.0814)
e3a3 N:     625 (+ 0) (P: 14.29%) (Q:  0.14722) (V:  0.3569)

# with PR
go nodes 800
c4d2 N:      48 (+ 0) (P: 13.63%) (Q:  0.00272) (V:  0.0637) 
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 

go nodes 1600
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 
h7h6 N:     568 (+ 0) (P:  3.92%) (Q:  0.17601) (V:  0.0814)

Where with the PR, the checkmate move makes the forced king move a loss which then makes Qxh7+ become a win and the blundering Qxa3 a loss. With Qxa3 appearing as Q: -1, additional visits avoid it and quickly find an alternate h6 move.

Mardak · 2019-03-30T01:49:35Z

I modified the position/fen of the previous comment as a TB sanity check for a 10-piece position converting moves from my 6-piece TB files:

position fen 7k/6pp/5p2/1p6/2q2R1Q/1B6/7K/8 w - - 0 36

go nodes 1750000
…
h4g3 N:   14626 (+ 1) (P:  2.27%) (Q:  0.98233) (V:  0.9885)
h4g4 N:   15603 (+ 3) (P:  2.35%) (Q:  0.98313) (V:  0.9657)
f4c4 N:  414512 (+ 0) (P: 18.75%) (Q:  1.00000) (V:  1.0000) (T)
b3c4 N: 1277607 (+ 0) (P: 33.73%) (Q:  1.00000) (V:  1.0000) (T)

Where after capturing c4 queen with Rxc4 or Bxc4, eventually all possible black moves are found to be losses because some followup white move is found to be a converted 8-piece TB win. Verbose move stats showing (T) for a 10 piece position, neat! 😄

Mardak · 2019-03-30T03:32:21Z

Here's a comparison with 1 million nodes before/after PR for a test position:

./lc0 -w 50783 --verbose-move-stats --smart-pruning-factor=0
position fen 7k/6pp/2p2p2/1R6/2r2R1Q/1B6/7K/8 b - - 0 36
go nodes 1000000

# without PR
info depth 7 seldepth 18 time 276485 nodes 1000095 score cp -6885 hashfull 1000 nps 3617 tbhits 1410 pv g7g5 h4h5 c4e4 h5h6 e4e7 h6h3 h7h5 h2g1 e7g7 g1g2 g7a7
c4c3 N:   42000 (+ 5) (P:  5.92%) (Q: -0.99767) (V:  -.----)
c4a4 N:   42258 (+ 3) (P:  5.33%) (Q: -0.99606) (V:  -.----)
c6c5 N:   42299 (+ 5) (P:  5.45%) (Q: -0.99634) (V:  -.----)
c4c2 N:   42655 (+ 8) (P:  5.10%) (Q: -0.99534) (V:  -.----)
c4c1 N:   43181 (+ 3) (P:  5.85%) (Q: -0.99708) (V:  -.----)
c4c5 N:   47580 (+ 3) (P:  6.12%) (Q: -0.99633) (V:  -.----)
c4d4 N:   48964 (+ 3) (P:  5.87%) (Q: -0.99537) (V:  -.----)
f6f5 N:   49652 (+ 6) (P:  6.09%) (Q: -0.99567) (V: -0.9950)
h8g8 N:   50759 (+15) (P:  6.11%) (Q: -0.99543) (V:  -.----)
c6b5 N:   51637 (+ 9) (P:  5.58%) (Q: -0.99409) (V:  -.----)
c4e4 N:   52993 (+ 7) (P:  6.24%) (Q: -0.99515) (V:  -.----)
c4b4 N:   56625 (+ 6) (P:  5.54%) (Q: -0.99297) (V:  -.----)
c4f4 N:   68245 (+ 4) (P:  5.26%) (Q: -0.99071) (V: -0.9991)
h7h5 N:   70686 (+17) (P:  6.45%) (Q: -0.99226) (V:  -.----)
g7g6 N:   76761 (+68) (P:  6.05%) (Q: -0.99088) (V: -0.9943)
h7h6 N:   79466 (+13) (P:  6.72%) (Q: -0.99153) (V: -0.9928)
g7g5 N:  134333 (+ 0) (P:  6.30%) (Q: -0.98741) (V: -0.9940)

# with PR
info depth 4 seldepth 18 time 181733 nodes 1000288 score cp -7273 hashfull 1000 nps 5504 tbhits 756 pv g7g5 h4h5 c4e4 h5h6 e4e7 h6h3 e7e2 h2g1 e2g2 g1h1
c4f4 N:   38452 (+ 0) (P:  5.26%) (Q: -1.00000) (V: -1.0000) (T)
c4a4 N:   38976 (+ 0) (P:  5.33%) (Q: -1.00000) (V: -1.0000) (T)
c6c5 N:   39857 (+ 0) (P:  5.45%) (Q: -1.00000) (V: -1.0000) (T)
c4c1 N:   42779 (+ 0) (P:  5.85%) (Q: -1.00000) (V: -1.0000) (T)
c4d4 N:   42879 (+ 0) (P:  5.87%) (Q: -1.00000) (V: -1.0000) (T)
c4c3 N:   43259 (+ 0) (P:  5.92%) (Q: -1.00000) (V: -1.0000) (T)
g7g6 N:   44184 (+ 0) (P:  6.05%) (Q: -1.00000) (V: -1.0000) (T)
f6f5 N:   44485 (+ 0) (P:  6.09%) (Q: -1.00000) (V: -1.0000) (T)
c4c5 N:   44731 (+ 0) (P:  6.12%) (Q: -1.00000) (V: -1.0000) (T)
c4e4 N:   45634 (+ 0) (P:  6.24%) (Q: -1.00000) (V: -1.0000) (T)
c4c2 N:   46594 (+ 0) (P:  5.10%) (Q: -0.99702) (V:  -.----)
h7h5 N:   47173 (+ 0) (P:  6.45%) (Q: -1.00000) (V: -1.0000) (T)
c6b5 N:   61614 (+18) (P:  5.58%) (Q: -0.99496) (V:  -.----)
h8g8 N:   63343 (+ 5) (P:  6.11%) (Q: -0.99560) (V:  -.----)
c4b4 N:   64565 (+ 5) (P:  5.54%) (Q: -0.99444) (V:  -.----)
h7h6 N:  110478 (+17) (P:  6.72%) (Q: -0.99171) (V: -0.9939)
g7g5 N:  181284 (+13) (P:  6.30%) (Q: -0.98886) (V:  -.----)

In this case, the search differences with PR:

g7g5 pv has 35% more visits (46951 more visits)
1m visits took 34% less time (95 fewer seconds)
final nps 52% higher (1887 extra nps)

And testing from startpos but just 100k nodes:

# without PR
info depth 12 seldepth 39 time 46118 nodes 100002 score cp 44 hashfull 302 nps 2168 tbhits 0 pv g1f3 d7d5 d2d4 e7e6 g2g3 c7c5 f1g2 g8f6 e1g1 c5d4 f3d4 e6e5 d4b3 c8e6 c2c4 b8c6 c4d5 f6d5 b1d2 d5f6 d2e4 f6e4 g2e4 e6h3 f1e1 f8b4 c1d2 b4d2 d1d2 d8d2 b3d2

# with PR
info depth 12 seldepth 39 time 46122 nodes 100275 score cp 44 hashfull 302 nps 2174 tbhits 0 pv g1f3 d7d5 d2d4 e7e6 g2g3 c7c5 f1g2 g8f6 e1g1 c5d4 f3d4 e6e5 d4b3 c8e6 c2c4 b8c6 c4d5 f6d5 b1d2 d5f6 d2e4 f6e4 g2e4 e6h3 f1e1 f8b4 c1d2 b4d2 d1d2 d8d2 b3d2

Looks like no noticeable difference in speed.

Mardak · 2019-03-30T07:05:32Z

@Ttl One interesting aspect I noticed with nodes becoming terminal losses is the visits remain so if what search had been thinking is the best move leads to the opponent having a terminal win, search needs enough visits for a 2nd-best move to exceed and be picked even though we know for sure the most visited move is bad. Maybe we should pick move to play based on lower confidence bound like leela-zero/leela-zero#2290 ?

E.g., from #822 (comment)

go nodes 800
c4d2 N:      48 (+ 0) (P: 13.63%) (Q:  0.00272) (V:  0.0637) 
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 

go nodes 1600
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 
h7h6 N:     568 (+ 0) (P:  3.92%) (Q:  0.17601) (V:  0.0814)

It would be unfortunate if e3a3 was picked somewhere between 800 visits and 1590 visits even though we know we shouldn't play it. But even then, the behavior is still no worse than the existing search picking this bad move.

Also somewhat related is these terminal loss moves probably shouldn't get any more visits, but we end up "wasting" CPU cycles adjusting the "uncertainty" for these moves.

Both of these aren't really specific to this issue though, so probably be their own PRs.

ddobbelaere · 2019-03-30T07:23:45Z

Nice PR. I like the concept that you conclude that a node is terminal during playout instead of during backup, which is very neat. The code is also non-obtrusive, it has to be said.

I think you can even go one step further and also infer some kinds of certain draws. In the logic, just add a should_become_draw path based on Q == 0 && child.IsTerminal(). This way it might also help with three fold reps, perpetuals, stalemate etc.

Of course, a lot of cases will still fall through the mazes of the net that only a full-fledged solution like #700 can solve, as you already indicated.

Mardak · 2019-03-30T07:28:17Z

think you can even go one step further and also infer certain draws.

Shhh 😉 I was trying to keep this PR small. Yes, it should just be adding a similar to should_become_draw declaration, accumulation, and conditional termination. Although I wasn't really sure how often all possible moves show up as certain drawn…

Videodr0me · 2019-03-30T08:37:59Z

Hi there. I opted against doing it in picking nodes because generally there are only few certain nodes and the code in backpropagatin only adds overhead if and only if the node to process is actually certain (or has any bounds, as i also determine bounds). So it has no overhead for all other (the vast majority) of paths explored in node picking. Also (not currently done in PR487 or PR700) in backprop you can cease to do any checks if there has been no change in certainty state when going up a level.

That being said, you could easily move your code to backprop with almost no changes. Your code is node-centric which makes it fit well with current structure of backprop and picknode. I also contemplated that but decided against doing it. The main reasons are in how i envision the future of search:
(1) transpositions
(2) look-ahead search
In both cases require a more edge centric view as nodes can be reached via different edges and edges will need to hold path depended information. Also with the look-ahead we do no longer have to create nodes that we know to be certain (as the edges already contain that information). This was under the paradigm: reduce the memory footprint as much as possible.

While this probably was not the best decision in terms of "short-term" mergability, i still believe that it will be unavoidable in the long-term if we ever want to handle transpositions and/or lookahead search efficiently.

If you want to retain as much of the functionality of PR487 or PR700 as possibly. You could just move the certainty status to the node structure. Then the code changes collapse/simplify greatly in the getters, the only thing you have to ditch is the look-ahead or PR487.

Some other remarks. You may have a problem if a terminal node gets to be root if opponent plays into mate. So you might have to unterminal root when initializing search. Also if you want to retain a previously calculated mating sequence, you need to change the trim at head behaviour, as currently you trim if the head is terminal, basically throwing away the search result leading to the terminal condition. Then you also could check for certain draws and bounds (this is very effective for discovering draws).

Videodr0me · 2019-03-30T08:58:26Z

@Ttl One interesting aspect I noticed with nodes becoming terminal losses is the visits remain so if what search had been thinking is the best move leads to the opponent having a terminal win, search needs enough visits for a 2nd-best move to exceed and be picked even though we know for sure the most visited move is bad. Maybe we should pick move to play based on lower confidence bound like leela-zero/leela-zero#2290 ?

No need, you can do what PR700 does and adjust the tuple that is used for GetBestChildrenNoTemperature. That solves this issue.

Also somewhat related is these terminal loss moves probably shouldn't get any more visits, but we end up "wasting" CPU cycles adjusting the "uncertainty" for these moves.

This can only be done safely at root. As terminal losses need to be hit (even repeatedly) to inform search about the inferiority of the path. If policy guided UCT hits these nodes depite having a Q of -1 the line must be really bad and these hits help to start exploring alternatives early. At root you can safely skip these as there is no information left. I tried various schemes (in the tree, not at root), not visiting loosing childs at all, decreasing visit probability with N, decreasing visit probability with depth. The result was that not visiting at all looses elo, results of the more "refined" schemes (N, depth) were inconclusive. So i would strongly advise to only skip picking loosing moves at root and let PUCT do its thing in tree.

roy7 · 2019-03-30T12:10:39Z

@Mardak Something I toyed with long ago was, if I decided a child needed pruned (in this case with Chess, a terminal node that is a loss) I'd remove the search and eval data from that move all the way back up the tree to overcome the "averaging" problem of MCTS. This was separate from using max_lcb to choose moves, the idea being instead of needing the 2nd move to get enough visits to outnumber the 1st move, I'd lower the # of visits to the 1st move if I pruned a big branch in the tree. (Lowering the average win rate much faster, and promoting 2nd move faster than waiting it out.)

Just food for thought if you ever want to try it.

Videodr0me · 2019-03-31T12:18:49Z

@Mardak Something I toyed with long ago was, if I decided a child needed pruned (in this case with Chess, a terminal node that is a loss) I'd remove the search and eval data from that move all the way back up the tree to overcome the "averaging" problem of MCTS. This was separate from using max_lcb to choose moves, the idea being instead of needing the 2nd move to get enough visits to outnumber the 1st move, I'd lower the # of visits to the 1st move if I pruned a big branch in the tree. (Lowering the average win rate much faster, and promoting 2nd move faster than waiting it out.)

Just food for thought if you ever want to try it.

Hi, I solved this issue by doing this:
(1) if a subbranch is made certain, which sometimes means a huge swing in eval, I adjust the scores in the tree as if all previous moves to this subbranch already yielded the certain result. This leads to leela changing its mind much more quickly without having to wait for the averaging operator to catch up.
(2) luckily you can do that basically for free, as you can do it on the same backprop that backpropagates the result anyways. Basically you just adjust the weight of the result and through the linear tree algebra you get a consistent tree just by using the normal backprop mechanism - no additional traversals needed, no extra nodes visited.
You can see how i did that in PR700 or PR487 by checking the section in backprop where i compare the previous q to the current q and adjust the weights:

lc0/src/mcts/search.cc

Line 1429 in d6cfd84

// Certainty propagation: adjust Qs along the path as if all visits already

Mardak · 2019-03-31T15:31:11Z

Thanks for the suggestions. Terminating parents from DoBackupUpdateSingleNode does seem to be slightly more optimal:

avoids wasting at least 1 visit on the forward pass
more accurate Q for PickNodeToExtend 1 batch earlier
avoids accumulating terminating condition across children in the common case

The downside is that it's slightly more complex requiring a new children loop in each step of backup, but the extra loop should only trigger when a parent has a new terminal child. This conditional behavior aspect would allow backup to do other more complex changes without impacting the forward common case.

Videodr0me · 2019-03-31T15:54:07Z

Thanks for the suggestions. Terminating parents from DoBackupUpdateSingleNode does seem to be slightly more optimal:
1. avoids wasting at least 1 visit on the forward pass

2. more accurate Q for `PickNodeToExtend` 1 batch earlier

3. avoids accumulating terminating condition across children in the common case
The downside is that it's slightly more complex requiring a new children loop in each step of backup, but the extra loop should only trigger when a parent has a new terminal child. This conditional behavior aspect would allow backup to do other more complex changes without impacting the forward common case.

Exactly. Even if the original node is terminal you do not need to check all childs on every depth. As soon as you get to a node that does not get to be made terminal you can stop checking all childs from that node onward until backprop is finished. Thats the most optimal way and also avoids some other issues with doing it in picknode. If you move it to backprop and also make nodes a terminal draw if all childs are terminal draws then you have the simplest version i can think of.

And if you succeed in getting that merged, we can then add more features over time. Just for reference here is my findings in how much elo each part of PR487 gains:
(1) Bare bones CP, like this PR is not measurable. But - and thats the nice thing which attracted me to all this - it is at least >=0 for theoretical reasons and needs no tuning.
(2) Bounds checking and score adjusting is about +10 Elo
(3) one-ply-lookahead and 2-fold-draw in conjunction is also about+10Elo
(4) The other features of 487 and 700 do not impact elo, but reduce trolling, display mate, prefer mates over TB hits, play immediatly if mate is proven. These are more "cosmetic" in nature.

zz4032 · 2019-03-31T16:46:38Z

TC 15.12s/game + 0.063s/move (~4000nodes/move)
lc0_PR822 with option.Terminator=true.

   # PLAYER       :  RATING  ERROR  LOS(%)   GAMES  DRAWS(%)
   1 lc0          :     0.0   ----    94.9    1000      71.3
   2 lc0_PR822    :    -9.1   10.9     ---    1000      71.3

White ELO advantage = 70.7
Draw rate (equal opponents) = 77.6 %

Mardak · 2019-04-01T01:13:38Z

Added a commit to do Terminator from DoBackupUpdateSingleNode instead of PickNodeToExtend. Here's an example of the more efficient visit usage using the first position from the first comment #822 (comment):

position fen 3k4/p4Q2/3N3p/1p1B4/8/1P2P3/2R2PPP/6K1 w - - 1 31 moves f7a7

# old PickNodeToExtend behavior
h6h5 N:       1 (+ 1) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       2 (+ 0) (P: 52.32%) (Q: -0.98630) (V: -0.9726)

h6h5 N:       2 (+ 0) (P: 47.68%) (Q: -0.98746) (V: -0.9749)
b5b4 N:       3 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

h6h5 N:       3 (+ 0) (P: 47.68%) (Q: -1.00000) (V: -1.0000) (T)
b5b4 N:       4 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

# new DoBackupUpdateSingleNode behavior
h6h5 N:       1 (+ 0) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       2 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

h6h5 N:       2 (+ 0) (P: 47.68%) (Q: -1.00000) (V: -1.0000) (T)
b5b4 N:       3 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

Instead of taking 6 visits to identify both moves as terminal, the new behavior finds it 2 visits earlier (1 visit for each terminal).

Also, the new commit adds in draw checks. Here's a dummy test position:

position fen qk1K4/8/8/8/8/8/8/Q7 w - - 1 1

# progression --no-terminator
a1a8 N:       1 (+ 0) (P:  5.96%) (Q:  0.04149) (V:  0.0415)
a1a8 N:       2 (+ 0) (P:  5.96%) (Q:  0.02074) (V:  0.0415)
a1a8 N:       3 (+ 0) (P:  5.96%) (Q:  0.01383) (V:  0.0415)

# progression --terminator
a1a8 N:       1 (+ 0) (P:  5.96%) (Q:  0.04149) (V:  0.0415)
a1a8 N:       2 (+ 0) (P:  5.96%) (Q:  0.00000) (V:  0.0000) (T)

Mardak · 2019-04-01T06:10:47Z

@Ttl Oh nice! I didn't notice your #817 earlier. Here's the output with PRs combined:

position startpos moves g1f3 e7e6 c2c4 g8f6 d2d4 f8b4 c1d2 b4e7 a2a3 d7d5 e2e3 e8g8 b1c3 b8d7 d1c2 b7b6 c4d5 e6d5 f1e2 c8b7 f3e5 a7a6 e1g1 c7c5 d2e1 a8c8 c2f5 d8c7 f2f4 c8d8 a1c1 f8e8 e1h4 b6b5 e2d3 d7f8 f5h3 f6e4 c3e4 d5e4 d3b1 c5c4 f4f5 e7h4 h3h4 f7f6 e5g4 f8d7 g4f2 d7b6 f2e4 b7e4 b1e4 e8e7 c1e1 d8e8 f1f4 g8h8 e4f3 c4c3 b2c3 c7c3 h2h3 e7e3 e1e3 c3e3 g1h2 b6c4 f3d5

go nodes 700
h7h6 N:      15 (+ 0) (P:  3.92%) (Q:  0.09498) (LCB:  0.43525) (V:  0.0814)
e8d8 N:      12 (+ 0) (P:  2.80%) (Q:  0.11855) (LCB:  0.49849) (V:  0.0983)
e3a3 N:     542 (+ 1) (P: 14.29%) (Q:  0.28268) (LCB:  0.61505) (V:  0.3569)
bestmove e3a3 ponder d5f7

go nodes 1000
e3a3 N:     561 (+ 0) (P: 14.29%) (Q: -1.00000) (LCB:  0.00000) (V: -1.0000) (T)
…
h7h6 N:      74 (+ 1) (P:  3.92%) (Q:  0.13845) (LCB:  0.51435) (V:  0.0814)
e8d8 N:      93 (+ 0) (P:  2.80%) (Q:  0.15288) (LCB:  0.53667) (V:  0.0983)
bestmove e3a3 ponder d5f7

go nodes 1300
e3a3 N:     561 (+ 0) (P: 14.29%) (Q: -1.00000) (LCB:  0.00000) (V: -1.0000) (T)
…
e8d8 N:     136 (+ 1) (P:  2.80%) (Q:  0.15390) (LCB:  0.54272) (V:  0.0983)
h7h6 N:     269 (+ 0) (P:  3.92%) (Q:  0.16087) (LCB:  0.54902) (V:  0.0814)
bestmove h7h6 ponder f4f3

Looks like your PR definitely helps avoid picking known losing moves! 😄

Mardak · 2019-04-01T17:04:22Z

I ran 1000 selfplay with 800 visits to see what's the behavior of parents getting terminated with this PR.

Generally, at 800 visits, the terminated nodes are far from root or weren't likely to be along the main pv; and the Q value before termination wasn't too far off.
A position that has a checkmate move for the opponent would likely have a Q around -0.65. MCTS probably would have preferred not visiting that parent anyway, so terminating it to Q: -1 makes it even less likely to visit.
Similarly positions that have all opponent moves as losses (most likely a position with a single forced move) already had a Q around 0.85, so MCTS would have favored the move anyway.
For the rare positions that could terminate as draw from this data, all of them were positions that had a single forced opponent draw move, and the Q was likely to be around 0.05.
For the extremely rare case where the pv got terminated into a loss (i.e., the game hadn't resigned already) a quick glance shows that these positions were not far from losing anyway, so additional search wouldn't have found a better move.

Here's some stats of terminated parents:

46337 loss terminals
- 0 with 1 edge
- 13 with 2 edges
- 13 with 3 edges
- 130 with 4-9 edges
- 4353 with 10-19 edges
- 12125 with 20-29 edges
- 16679 with 30-39 edges
- 10980 with 40-49 edges
- 2009 with 50-59 edges
- 35 with 60-69 edges
- 0 with 70+ edges
5860 won terminals
- 5522 with 1 edge
- 328 with 2 edges
- 10 with 3 edges
- 0 with 4+ edges
497 draw terminals
- 497 with 1 edge
- 0 with 2+ edges

So there are definitely a lot more parents that could be terminated as loss because one of generally 10+ possible moves are opponent with checkmate. There were a small portion of positions where the opponent had multiple moves leading to checkmate allowing a parent terminated as won, and overall roughly 90% fewer won than loss terminated parents. Similarly roughly 90% fewer draw than won terminated parents.

Notice the red area does extend very far to the right and that means some parents terminated as loss thought the position was very much favorable only to realize the opponent has a checkmate move.

tournamentstatus final
P1: +443 -430 =127
Win: 50.65%
Elo:  4.52
LOS: 67.00%
P1-W: +236 -201 =63
P1-B: +207 -229 =64

Videodr0me · 2019-04-02T13:33:43Z

I checked the code (briefly) and everything looks ok now. I really think you cannot do a simpler version of this. So lets keep our fingers crossed that this gets commited, so we have a starting point to maybe build on top of this.

Your stats match mine, and are the reason why i push for two-fold draw scoring. Certain draws will then increase substantially in endgame positions (and the lookahead will increase those even further at no gpu cost - just cpu).

Last two things I would suggest is to call it "certainty-propagation" or maybe even more accurately "terminal-propagation". An also you should maybe not trim the head completely - because currently as soon as a terminal becomes root (start of the mating sequence) all previous computation is lost because the tree gets deleted. This might lead to strange sudden death behavior. You could just change the call to trimathead to only occur if the terminal node has no children and unterminal root.

Mardak · 2019-04-02T20:15:57Z

There's a set of issues related to trimming the head or not that would complicate this PR, so not sure if it's worth delaying getting things merged vs cleaning it up in its own PR. Similarly naming discussions often lead to time wasted bikeshedding, so I usually just let the reviewer decide if we even need to have a option/param.

gonzalezjo · 2019-04-15T13:36:05Z

Probably a stupid question: What are the functional differences between this and PR800 now?

src/mcts/params.cc

src/mcts/node.cc

src/mcts/search.h

src/mcts/search.cc

… or draw.

killerducky · 2019-04-16T21:28:34Z

@Mardak do you know why this test showed an Elo regression?
#822 (comment)

Is the code slower? Or was there a problem that has been fixed since then?

Mardak · 2019-04-16T21:34:22Z

The old code was a bit inefficient in the common case affecting all node picking vs the current code only doing extra work when a new terminal was visited. There is also a fix in the current code to retain existing tree if the game played into a "terminal" position where the old code would have required re-searching already visited nodes.

Tilps · 2019-04-17T01:48:13Z

src/mcts/search.cc

+    // A non-winning terminal move needs all other moves to have the same value.
+    if (can_convert && v != 1.0f) {
+      for (const auto& edge : p->Edges()) {
+        can_convert = can_convert && edge.IsTerminal() && edge.GetQ(0.0f) == v;


This logic can be extended so that if all nodes are terminal, but they don't all agree on v - then parent can be made terminal draw. But theoretically assuming imperfect opponent play maybe it could be better not to do so.

Yes if all childs are terminal (or certain in my parlance) the maximum should be returned, so if all moves are (certain) losses and one is a (certain) draw, we can safely consider the position a certain draw. The perfect play assumption is not really relevant, as all opponents of leela either share the same codebase (different NNs) or are much superior in propagating minimax scores (basically all a/b engines). I would suggest to implement this - its simple and should maybe only add one line of code, but this could be done in further PRs after this is in.

This could be extended in a separate PR to improve the logic.

Videodr0me · 2019-04-17T07:28:24Z

I still find the name of the option a little misleading, as mates in the opening or middle gamephase are also affected. I would still suggest calling it "terminal-propagation". Also in its current simple form it can be an always on feature so maybe we don't need an option after all.

Videodr0me · 2019-04-17T07:32:53Z

I saw that you avoid trimming the head now and recalculate N. But i vaguely remember that I also had to change makemove, as otherwise there are problems in self-play and treesharing - did you test that?

Mardak · 2019-04-17T18:42:25Z

i vaguely remember that I also had to change makemove, as otherwise there are problems in self-play and treesharing - did you test that?

It does look like the root can end up terminal if selfplay is run with --reuse-tree but the default is off, and it seems likely it will stay that way for training unless other things get fixed first, e.g., applying noise to existing nodes. I'll add a note to #775 if it becomes a more general issue of fixing tree-reuse issues.

Videodr0me · 2019-04-18T09:54:52Z

It does look like the root can end up terminal if selfplay is run with --reuse-tree but the default is off, and it seems likely it will stay that way for training unless other things get fixed first, e.g., applying noise to existing nodes. I'll add a note to #775 if it becomes a more general issue of fixing tree-reuse issues.

Things should not break if people use different options. The change required to fix this is basically identical to what you already did in Resettoposition, you just have to add the same lines of code to makemove. You can check PR700 for reference.

Mardak · 2019-04-18T13:06:10Z

But how does it break? The outcome of the games are the same.

Mardak · 2019-04-18T13:50:16Z

The change required to fix this is basically identical to what you already did in Resettoposition, you just have to add the same lines of code to makemove.

Copy/pasting code is not a good solution in general as it often leads to bugs and code maintenance issues down the line that need additional review cycles or mental overhead causing.

I can file a followup PR with a better fix than just duplicating code.

Videodr0me · 2019-04-18T19:53:29Z

But how does it break? The outcome of the games are the same.

There might be several issues, games might not be played to their conclusion according to the rules of chess, terminating early before mate or a real drawn terminal is actually on the board and i think there might even be some edge cases where it can crash (but this might or might not apply to your current implementation).

Videodr0me · 2019-04-18T20:01:19Z

Copy/pasting code is not a good solution in general as it often leads to bugs and code maintenance issues down the line that need additional review cycles or mental overhead causing.

Obviously it was not meant as suggesting to cut/paste the code verbatim. It was just meant as a pointer on what could to be done in makemove to avoid potential problems - naturally it needs to be adapted. The aim was to raise awareness of potential issues - so we can get this into master without breaking anything.

Mardak · 2019-04-19T19:02:28Z

r?@mooskagh or @Tilps Could you additionally review the added commit f39a874 that moves the original terminal trimming head behavior into MakeMove as a conditional MakeNotTerminal()? This proactively fixes selfplay with tree reuse if TBs were supported in selfplay too.

Videodr0me · 2019-04-20T06:39:04Z

Probably a stupid question: What are the functional differences between this and PR800 now?

PR800 is based on PR700 (basic certainty propagation). The differences to this PR are:

PR700/800 display mate scores, this PR does not.
PR700/800 prefer terminal wins over certain wins, this PR does not differentiate between the two. This leads to situations where this PR will postpone a mate-in-one just before the 50 move counter runs out.
PR700/800 differentiates between mates proven with and without TB hits, this PR does not. This leads to situations where this PR might go for a long TB win instead of a shorter direct mate.
PR700/800 play winning moves and avoid losing moves regardless of visits. This PR might not play a proven win if its visits did not catch up to the current best move in time or it might play a loosing move if alternatives did not overtake it in visits in time.
PR700/800 establishes proper minimax scores, this PR does not. So cases where all moves, except one drawing move are losing, are not propagated upward making this PR slightly less effective.
PR700/800 establish certain bounds. For example if one side can "force" a draw but has alternatives, the position is scored as a lower bound draw. This is very effective in properly scoring repeatitions and is more effective than this PR.
PR700/800 have an option to enable "two-fold" draw scoring, which increases the number of certain nodes and cuts down on tree combinatorics also making it more effective than this PR.
PR700/800 store information in the edges, while this PR overloads the terminal bool in nodes. This makes this PR fit the current codebase more elegantly and it uses less memory.

But all of this is inconsequential, as many of the strength gaining features can be added to this PR later once its in master over time. You can think of PR487/700/800 as a proof-of-concept of what is possible if you push this concept to its limits. In doing that i imposed - maybe to much - of my vision on how the codebase should evolve, and generally underestimated that the collaborative approach is better served with implementing features in small increments.And this is what this PR is about.

src/mcts/node.cc

Tilps · 2019-04-22T05:25:39Z

r?@mooskagh or @Tilps Could you additionally review the added commit f39a874 that moves the original terminal trimming head behavior into MakeMove as a conditional MakeNotTerminal()? This proactively fixes selfplay with tree reuse if TBs were supported in selfplay too.

I wondered whether MakeNotTerminal should be kept in case MakeMove isn't called - but I guess its fine since the making of terminals never applies to the root - so even if we start from fen which is late game and able to be proven terminal, the root itself doesn't get marked and so does not need to be reset.
So I think this is fine.

mooskagh · 2019-04-22T05:37:52Z

src/mcts/node.cc

+      const auto n = child.GetN();
+      if (n > 0) {
+        n_ += n;
+        q_ -= child.GetQ(0.0f) * n;


Would be nice to add a comment why it is -= here instead of +=.

Would += -child.GetQ() instead be more immediately clear without a comment?

Switched to += and added comment:

lc0/src/mcts/node.cc

Lines 244 to 245 in 9b76ded

// Flip Q for opponent.

q_ += -child.GetQ(0.0f) * n;

src/mcts/node.cc

Tilps

Looks good.

Mardak force-pushed the terminate-search branch from 4e92d4d to 69aff25 Compare March 30, 2019 01:50

Mardak force-pushed the terminate-search branch 2 times, most recently from 99e7927 to bccfd10 Compare April 1, 2019 20:32

Mardak changed the title ~~Terminate search if any move is a win or all moves are loss.~~ Also make parents Terminal if any move is a win or all moves are loss or draw. Apr 1, 2019

Mardak force-pushed the terminate-search branch from bccfd10 to fd3885c Compare April 2, 2019 00:45

Mardak force-pushed the terminate-search branch 2 times, most recently from 76092ac to 7cfa496 Compare April 2, 2019 23:08

mooskagh reviewed Apr 16, 2019

View reviewed changes

Also make parents Terminal if any move is a win or all moves are loss…

c7ad3d6

… or draw.

Mardak force-pushed the terminate-search branch from f23b23d to 0a2b234 Compare April 16, 2019 19:29

Switch back to handwritten loop instead of std::all_of.

751b41b

Tilps reviewed Apr 17, 2019

View reviewed changes

Mardak mentioned this pull request Apr 17, 2019

Noise doesn't work with tree reuse #775

Closed

Move MakeNotTerminal call into MakeMove from ResetToPosition.

f39a874

Tilps reviewed Apr 22, 2019

View reviewed changes

src/mcts/node.cc Show resolved Hide resolved

Recompute eval in MakeNotTerminal.

802e1ff

mooskagh reviewed Apr 22, 2019

View reviewed changes

Address review comments to MakeNotTerminal.

9b76ded

Mardak force-pushed the terminate-search branch from b9b5e2f to 9b76ded Compare April 22, 2019 05:51

Tilps approved these changes Apr 22, 2019

View reviewed changes

Tilps merged commit 7fc6d4f into LeelaChessZero:master Apr 22, 2019

Mardak deleted the terminate-search branch April 22, 2019 06:09

This was referenced Apr 22, 2019

Convert more positions to terminals, e.g., all terminal children with different evals #836

Closed

Search or move selection should prefer (shorter) mates #837

Closed

This was referenced Jun 13, 2019

Check for null parent in search backup. #887

Merged

Prefer terminal wins (and avoid loss) when picking best children. #889

Closed

Mardak mentioned this pull request Nov 27, 2019

Allow mixed terminal values when converting parents to terminals. #1023

Merged

Also make parents Terminal if any move is a win or all moves are loss or draw. #822

Also make parents Terminal if any move is a win or all moves are loss or draw. #822

Conversation

Mardak commented Mar 29, 2019

Mardak commented Mar 29, 2019 • edited Loading

Mardak commented Mar 29, 2019 • edited Loading

Mardak commented Mar 30, 2019 • edited Loading

Mardak commented Mar 30, 2019

Mardak commented Mar 30, 2019 • edited Loading

ddobbelaere commented Mar 30, 2019 • edited Loading

Mardak commented Mar 30, 2019

Videodr0me commented Mar 30, 2019 • edited Loading

Videodr0me commented Mar 30, 2019 • edited Loading

roy7 commented Mar 30, 2019 • edited Loading

Videodr0me commented Mar 31, 2019 • edited Loading

Mardak commented Mar 31, 2019 • edited Loading

Videodr0me commented Mar 31, 2019 • edited Loading

zz4032 commented Mar 31, 2019

Mardak commented Apr 1, 2019 • edited Loading

Mardak commented Apr 1, 2019

Mardak commented Apr 1, 2019 • edited Loading

Videodr0me commented Apr 2, 2019 • edited Loading

Mardak commented Apr 2, 2019

gonzalezjo commented Apr 15, 2019

killerducky commented Apr 16, 2019

Mardak commented Apr 16, 2019

Tilps Apr 17, 2019

Choose a reason for hiding this comment

Videodr0me Apr 17, 2019 • edited Loading

Choose a reason for hiding this comment

Mardak Apr 17, 2019

Choose a reason for hiding this comment

Videodr0me commented Apr 17, 2019

Videodr0me commented Apr 17, 2019

Mardak commented Apr 17, 2019

Videodr0me commented Apr 18, 2019

Mardak commented Apr 18, 2019

Mardak commented Apr 18, 2019

Videodr0me commented Apr 18, 2019

Videodr0me commented Apr 18, 2019

Mardak commented Apr 19, 2019

Videodr0me commented Apr 20, 2019 • edited Loading

Tilps commented Apr 22, 2019

mooskagh Apr 22, 2019

Choose a reason for hiding this comment

Mardak Apr 22, 2019

Choose a reason for hiding this comment

Mardak Apr 22, 2019 • edited Loading

Choose a reason for hiding this comment

Tilps left a comment

Choose a reason for hiding this comment

Mardak commented Mar 29, 2019 •

edited

Loading

Mardak commented Mar 29, 2019 •

edited

Loading

Mardak commented Mar 30, 2019 •

edited

Loading

Mardak commented Mar 30, 2019 •

edited

Loading

ddobbelaere commented Mar 30, 2019 •

edited

Loading

Videodr0me commented Mar 30, 2019 •

edited

Loading

Videodr0me commented Mar 30, 2019 •

edited

Loading

roy7 commented Mar 30, 2019 •

edited

Loading

Videodr0me commented Mar 31, 2019 •

edited

Loading

Mardak commented Mar 31, 2019 •

edited

Loading

Videodr0me commented Mar 31, 2019 •

edited

Loading

Mardak commented Apr 1, 2019 •

edited

Loading

Mardak commented Apr 1, 2019 •

edited

Loading

Videodr0me commented Apr 2, 2019 •

edited

Loading

Videodr0me Apr 17, 2019 •

edited

Loading

Videodr0me commented Apr 20, 2019 •

edited

Loading

Mardak Apr 22, 2019 •

edited

Loading