Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Also make parents Terminal if any move is a win or all moves are loss or draw. #822

Merged
merged 6 commits into from
Apr 22, 2019

Conversation

Mardak
Copy link
Contributor

@Mardak Mardak commented Mar 29, 2019

r?@mooskagh Not as feature-full as #700, but this behaves more like TB terminals. Convert a node when search notices children have terminal behavior so future search can stop sooner. Slight refactoring of NodeToProcess to as TerminalHit behaved the same as Extension(1).

Example:
Screen Shot 2019-03-29 at 2 27 33 PM

./lc0 -w 11258 --verbose-move-stats --fpu-strategy-at-root=absolute --minibatch-size=1 --smart-pruning-factor=0
position fen 3k4/p4Q2/3N3p/1p1B4/8/1P2P3/2R2PPP/6K1 w - - 1 31 moves f7a7

# just loading the position
go nodes 1    
h6h5 N:       0 (+ 0) (P: 47.68%) (Q:  1.00000) (V:  -.----)
b5b4 N:       0 (+ 0) (P: 52.32%) (Q:  1.00000) (V:  -.----)

# root fpu=1 visit both; nothing special
go nodes 2
h6h5 N:       1 (+ 0) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       1 (+ 0) (P: 52.32%) (Q: -0.97261) (V: -0.9726)

# b5b4 finds c2c8#
go nodes 3
h6h5 N:       1 (+ 0) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       2 (+ 0) (P: 52.32%) (Q: -0.98630) (V: -0.9726)

# b5b4 c2c8# converts b5b4 into a loss
go nodes 5
h6h5 N:       2 (+ 0) (P: 47.68%) (Q: -0.98746) (V: -0.9749)
b5b4 N:       3 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

# h6h5 c2c8# converts h6h5 into a loss
go nodes 7
h6h5 N:       3 (+ 0) (P: 47.68%) (Q: -1.00000) (V: -1.0000) (T)
b5b4 N:       4 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

Then looking at one move before:
Screen Shot 2019-03-29 at 1 50 47 PM

position fen 3k4/p4Q2/3N3p/1p1B4/8/1P2P3/2R2PPP/6K1 w - - 1 31

# just normal out of order finding checkmate moves
go nodes 10
c2c6 N:       0 (+ 0) (P:  1.94%) (Q:  1.00000) (V:  -.----)
f7c7 N:       0 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  -.----)
d6c8 N:       0 (+ 0) (P:  1.98%) (Q:  1.00000) (V:  -.----)
d5e4 N:       0 (+ 0) (P:  2.06%) (Q:  1.00000) (V:  -.----)
d5b7 N:       1 (+ 0) (P:  2.21%) (Q:  0.97211) (V:  0.9721)
d5c6 N:       1 (+ 0) (P:  2.16%) (Q:  0.97785) (V:  0.9778)
c2c7 N:       1 (+ 0) (P:  2.24%) (Q:  0.97764) (V:  0.9776)
d6b7 N:       1 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:       2 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:       3 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

# just normal search visits
go nodes 400
d6c8 N:       8 (+ 0) (P:  1.98%) (Q:  0.98583) (V:  0.9752)
d5e4 N:       8 (+ 0) (P:  2.06%) (Q:  0.98539) (V:  0.9736)
f7c7 N:       9 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  1.0000) (T)
d5b7 N:       9 (+ 0) (P:  2.21%) (Q:  0.98568) (V:  0.9721)
d5c6 N:       9 (+ 0) (P:  2.16%) (Q:  0.99107) (V:  0.9778)
c2c7 N:       9 (+ 0) (P:  2.24%) (Q:  0.98915) (V:  0.9776)
d5e6 N:      11 (+ 0) (P:  2.53%) (Q:  0.99073) (V:  0.9777)
d6b7 N:      18 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:      20 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:      37 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

# f7a7 (from first example) has enough visits that all its moves are losses, so convert to win
go nodes 500
f7a7 N:       9 (+ 0) (P:  1.52%) (Q:  1.00000) (V:  1.0000) (T)
f7f6 N:      10 (+ 0) (P:  1.74%) (Q:  1.00000) (V:  1.0000) (T)
f7c7 N:      11 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  1.0000) (T)
d5b7 N:      11 (+ 0) (P:  2.21%) (Q:  0.98829) (V:  0.9721)
d5c6 N:      11 (+ 0) (P:  2.16%) (Q:  0.99269) (V:  0.9778)
c2c7 N:      12 (+ 0) (P:  2.24%) (Q:  0.99186) (V:  0.9776)
d5e6 N:      13 (+ 0) (P:  2.53%) (Q:  0.99216) (V:  0.9777)
d6b7 N:      23 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:      24 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:      46 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

# lots of moves converting to win
go nodes 600
f7a7 N:      10 (+ 0) (P:  1.52%) (Q:  1.00000) (V:  1.0000) (T)
f7f6 N:      12 (+ 0) (P:  1.74%) (Q:  1.00000) (V:  1.0000) (T)
b3b4 N:      13 (+ 0) (P:  1.87%) (Q:  1.00000) (V:  1.0000) (T)
f7c7 N:      13 (+ 0) (P:  1.97%) (Q:  1.00000) (V:  1.0000) (T)
d5c6 N:      15 (+ 0) (P:  2.16%) (Q:  1.00000) (V:  1.0000) (T)
d5b7 N:      15 (+ 0) (P:  2.21%) (Q:  1.00000) (V:  1.0000) (T)
c2c7 N:      15 (+ 0) (P:  2.24%) (Q:  1.00000) (V:  1.0000) (T)
d5e6 N:      17 (+ 0) (P:  2.53%) (Q:  1.00000) (V:  1.0000) (T)
d6b7 N:      27 (+ 0) (P:  3.93%) (Q:  1.00000) (V:  1.0000) (T)
f7e8 N:      28 (+ 0) (P:  4.16%) (Q:  1.00000) (V:  1.0000) (T)
c2c8 N:      54 (+ 0) (P:  7.86%) (Q:  1.00000) (V:  1.0000) (T)

@Mardak
Copy link
Contributor Author

Mardak commented Mar 29, 2019

The behavior of search with this change:

  1. convert to loss when any child is checkmate prevents visiting other children, so this saves all future visits to these children
  2. nodes converted to loss are less likely to be visited by search favoring Q>-1 moves
  3. convert to win when all children are checkmated, so this node is more likely to be visited with Q=1 instead of other moves that might require NN lookup

No direct changes to root behavior to allow searching moves as normal except some of its children will have more accurate Q to direct search.

@Mardak
Copy link
Contributor Author

Mardak commented Mar 29, 2019

Here's the position that inspired me to write up this PR.

Screen Shot 2019-03-29 at 10 04 53 AM

All runs miss this mate-in-2 clearance / defense annihilation move Qxh7+ (11258 0.26%, 22202 0.31%, 32930 0.27%, 41750 0.29%, 50783 0.17%) where the king is forced to capture the queen Kxh7 and white mates with Rh4#. https://lichess.org/VGNIwD67#70

Looking at the move before the screenshot to try to avoid blundering with Qxa3:

./lc0 -w 11258 --verbose-move-stats --smart-pruning-factor=0
position startpos moves g1f3 e7e6 c2c4 g8f6 d2d4 f8b4 c1d2 b4e7 a2a3 d7d5 e2e3 e8g8 b1c3 b8d7 d1c2 b7b6 c4d5 e6d5 f1e2 c8b7 f3e5 a7a6 e1g1 c7c5 d2e1 a8c8 c2f5 d8c7 f2f4 c8d8 a1c1 f8e8 e1h4 b6b5 e2d3 d7f8 f5h3 f6e4 c3e4 d5e4 d3b1 c5c4 f4f5 e7h4 h3h4 f7f6 e5g4 f8d7 g4f2 d7b6 f2e4 b7e4 b1e4 e8e7 c1e1 d8e8 f1f4 g8h8 e4f3 c4c3 b2c3 c7c3 h2h3 e7e3 e1e3 c3e3 g1h2 b6c4 f3d5

# without PR
go nodes 800
c4d2 N:      47 (+ 0) (P: 13.63%) (Q:  0.00268) (V:  0.0637)
e3a3 N:     580 (+ 0) (P: 14.29%) (Q:  0.23623) (V:  0.3569)

go nodes 1600
h7h6 N:     483 (+ 1) (P:  3.92%) (Q:  0.18075) (V:  0.0814)
e3a3 N:     625 (+ 0) (P: 14.29%) (Q:  0.14722) (V:  0.3569)

# with PR
go nodes 800
c4d2 N:      48 (+ 0) (P: 13.63%) (Q:  0.00272) (V:  0.0637) 
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 

go nodes 1600
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 
h7h6 N:     568 (+ 0) (P:  3.92%) (Q:  0.17601) (V:  0.0814) 

Where with the PR, the checkmate move makes the forced king move a loss which then makes Qxh7+ become a win and the blundering Qxa3 a loss. With Qxa3 appearing as Q: -1, additional visits avoid it and quickly find an alternate h6 move.

@Mardak
Copy link
Contributor Author

Mardak commented Mar 30, 2019

I modified the position/fen of the previous comment as a TB sanity check for a 10-piece position converting moves from my 6-piece TB files:

Screen Shot 2019-03-29 at 6 42 58 PM

position fen 7k/6pp/5p2/1p6/2q2R1Q/1B6/7K/8 w - - 0 36

go nodes 1750000
…
h4g3 N:   14626 (+ 1) (P:  2.27%) (Q:  0.98233) (V:  0.9885)
h4g4 N:   15603 (+ 3) (P:  2.35%) (Q:  0.98313) (V:  0.9657)
f4c4 N:  414512 (+ 0) (P: 18.75%) (Q:  1.00000) (V:  1.0000) (T)
b3c4 N: 1277607 (+ 0) (P: 33.73%) (Q:  1.00000) (V:  1.0000) (T)

Where after capturing c4 queen with Rxc4 or Bxc4, eventually all possible black moves are found to be losses because some followup white move is found to be a converted 8-piece TB win. Verbose move stats showing (T) for a 10 piece position, neat! 😄

@Mardak
Copy link
Contributor Author

Mardak commented Mar 30, 2019

Here's a comparison with 1 million nodes before/after PR for a test position:
Screen Shot 2019-03-29 at 8 18 15 PM

./lc0 -w 50783 --verbose-move-stats --smart-pruning-factor=0
position fen 7k/6pp/2p2p2/1R6/2r2R1Q/1B6/7K/8 b - - 0 36
go nodes 1000000

# without PR
info depth 7 seldepth 18 time 276485 nodes 1000095 score cp -6885 hashfull 1000 nps 3617 tbhits 1410 pv g7g5 h4h5 c4e4 h5h6 e4e7 h6h3 h7h5 h2g1 e7g7 g1g2 g7a7
c4c3 N:   42000 (+ 5) (P:  5.92%) (Q: -0.99767) (V:  -.----)
c4a4 N:   42258 (+ 3) (P:  5.33%) (Q: -0.99606) (V:  -.----)
c6c5 N:   42299 (+ 5) (P:  5.45%) (Q: -0.99634) (V:  -.----)
c4c2 N:   42655 (+ 8) (P:  5.10%) (Q: -0.99534) (V:  -.----)
c4c1 N:   43181 (+ 3) (P:  5.85%) (Q: -0.99708) (V:  -.----)
c4c5 N:   47580 (+ 3) (P:  6.12%) (Q: -0.99633) (V:  -.----)
c4d4 N:   48964 (+ 3) (P:  5.87%) (Q: -0.99537) (V:  -.----)
f6f5 N:   49652 (+ 6) (P:  6.09%) (Q: -0.99567) (V: -0.9950)
h8g8 N:   50759 (+15) (P:  6.11%) (Q: -0.99543) (V:  -.----)
c6b5 N:   51637 (+ 9) (P:  5.58%) (Q: -0.99409) (V:  -.----)
c4e4 N:   52993 (+ 7) (P:  6.24%) (Q: -0.99515) (V:  -.----)
c4b4 N:   56625 (+ 6) (P:  5.54%) (Q: -0.99297) (V:  -.----)
c4f4 N:   68245 (+ 4) (P:  5.26%) (Q: -0.99071) (V: -0.9991)
h7h5 N:   70686 (+17) (P:  6.45%) (Q: -0.99226) (V:  -.----)
g7g6 N:   76761 (+68) (P:  6.05%) (Q: -0.99088) (V: -0.9943)
h7h6 N:   79466 (+13) (P:  6.72%) (Q: -0.99153) (V: -0.9928)
g7g5 N:  134333 (+ 0) (P:  6.30%) (Q: -0.98741) (V: -0.9940)

# with PR
info depth 4 seldepth 18 time 181733 nodes 1000288 score cp -7273 hashfull 1000 nps 5504 tbhits 756 pv g7g5 h4h5 c4e4 h5h6 e4e7 h6h3 e7e2 h2g1 e2g2 g1h1
c4f4 N:   38452 (+ 0) (P:  5.26%) (Q: -1.00000) (V: -1.0000) (T)
c4a4 N:   38976 (+ 0) (P:  5.33%) (Q: -1.00000) (V: -1.0000) (T)
c6c5 N:   39857 (+ 0) (P:  5.45%) (Q: -1.00000) (V: -1.0000) (T)
c4c1 N:   42779 (+ 0) (P:  5.85%) (Q: -1.00000) (V: -1.0000) (T)
c4d4 N:   42879 (+ 0) (P:  5.87%) (Q: -1.00000) (V: -1.0000) (T)
c4c3 N:   43259 (+ 0) (P:  5.92%) (Q: -1.00000) (V: -1.0000) (T)
g7g6 N:   44184 (+ 0) (P:  6.05%) (Q: -1.00000) (V: -1.0000) (T)
f6f5 N:   44485 (+ 0) (P:  6.09%) (Q: -1.00000) (V: -1.0000) (T)
c4c5 N:   44731 (+ 0) (P:  6.12%) (Q: -1.00000) (V: -1.0000) (T)
c4e4 N:   45634 (+ 0) (P:  6.24%) (Q: -1.00000) (V: -1.0000) (T)
c4c2 N:   46594 (+ 0) (P:  5.10%) (Q: -0.99702) (V:  -.----)
h7h5 N:   47173 (+ 0) (P:  6.45%) (Q: -1.00000) (V: -1.0000) (T)
c6b5 N:   61614 (+18) (P:  5.58%) (Q: -0.99496) (V:  -.----)
h8g8 N:   63343 (+ 5) (P:  6.11%) (Q: -0.99560) (V:  -.----)
c4b4 N:   64565 (+ 5) (P:  5.54%) (Q: -0.99444) (V:  -.----)
h7h6 N:  110478 (+17) (P:  6.72%) (Q: -0.99171) (V: -0.9939)
g7g5 N:  181284 (+13) (P:  6.30%) (Q: -0.98886) (V:  -.----)

In this case, the search differences with PR:

  • g7g5 pv has 35% more visits (46951 more visits)
  • 1m visits took 34% less time (95 fewer seconds)
  • final nps 52% higher (1887 extra nps)

And testing from startpos but just 100k nodes:

# without PR
info depth 12 seldepth 39 time 46118 nodes 100002 score cp 44 hashfull 302 nps 2168 tbhits 0 pv g1f3 d7d5 d2d4 e7e6 g2g3 c7c5 f1g2 g8f6 e1g1 c5d4 f3d4 e6e5 d4b3 c8e6 c2c4 b8c6 c4d5 f6d5 b1d2 d5f6 d2e4 f6e4 g2e4 e6h3 f1e1 f8b4 c1d2 b4d2 d1d2 d8d2 b3d2

# with PR
info depth 12 seldepth 39 time 46122 nodes 100275 score cp 44 hashfull 302 nps 2174 tbhits 0 pv g1f3 d7d5 d2d4 e7e6 g2g3 c7c5 f1g2 g8f6 e1g1 c5d4 f3d4 e6e5 d4b3 c8e6 c2c4 b8c6 c4d5 f6d5 b1d2 d5f6 d2e4 f6e4 g2e4 e6h3 f1e1 f8b4 c1d2 b4d2 d1d2 d8d2 b3d2

Looks like no noticeable difference in speed.

@Mardak
Copy link
Contributor Author

Mardak commented Mar 30, 2019

@Ttl One interesting aspect I noticed with nodes becoming terminal losses is the visits remain so if what search had been thinking is the best move leads to the opponent having a terminal win, search needs enough visits for a 2nd-best move to exceed and be picked even though we know for sure the most visited move is bad. Maybe we should pick move to play based on lower confidence bound like leela-zero/leela-zero#2290 ?

E.g., from #822 (comment)

go nodes 800
c4d2 N:      48 (+ 0) (P: 13.63%) (Q:  0.00272) (V:  0.0637) 
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 

go nodes 1600
e3a3 N:     564 (+ 0) (P: 14.29%) (Q: -1.00000) (V: -1.0000) (T) 
h7h6 N:     568 (+ 0) (P:  3.92%) (Q:  0.17601) (V:  0.0814)

It would be unfortunate if e3a3 was picked somewhere between 800 visits and 1590 visits even though we know we shouldn't play it. But even then, the behavior is still no worse than the existing search picking this bad move.

Also somewhat related is these terminal loss moves probably shouldn't get any more visits, but we end up "wasting" CPU cycles adjusting the "uncertainty" for these moves.

Both of these aren't really specific to this issue though, so probably be their own PRs.

@ddobbelaere
Copy link
Contributor

ddobbelaere commented Mar 30, 2019

Nice PR. I like the concept that you conclude that a node is terminal during playout instead of during backup, which is very neat. The code is also non-obtrusive, it has to be said.

I think you can even go one step further and also infer some kinds of certain draws. In the logic, just add a should_become_draw path based on Q == 0 && child.IsTerminal(). This way it might also help with three fold reps, perpetuals, stalemate etc.

Of course, a lot of cases will still fall through the mazes of the net that only a full-fledged solution like #700 can solve, as you already indicated.

@Mardak
Copy link
Contributor Author

Mardak commented Mar 30, 2019

think you can even go one step further and also infer certain draws.

Shhh 😉 I was trying to keep this PR small. Yes, it should just be adding a similar to should_become_draw declaration, accumulation, and conditional termination. Although I wasn't really sure how often all possible moves show up as certain drawn…

@Videodr0me
Copy link
Collaborator

Videodr0me commented Mar 30, 2019

Hi there. I opted against doing it in picking nodes because generally there are only few certain nodes and the code in backpropagatin only adds overhead if and only if the node to process is actually certain (or has any bounds, as i also determine bounds). So it has no overhead for all other (the vast majority) of paths explored in node picking. Also (not currently done in PR487 or PR700) in backprop you can cease to do any checks if there has been no change in certainty state when going up a level.

That being said, you could easily move your code to backprop with almost no changes. Your code is node-centric which makes it fit well with current structure of backprop and picknode. I also contemplated that but decided against doing it. The main reasons are in how i envision the future of search:
(1) transpositions
(2) look-ahead search
In both cases require a more edge centric view as nodes can be reached via different edges and edges will need to hold path depended information. Also with the look-ahead we do no longer have to create nodes that we know to be certain (as the edges already contain that information). This was under the paradigm: reduce the memory footprint as much as possible.

While this probably was not the best decision in terms of "short-term" mergability, i still believe that it will be unavoidable in the long-term if we ever want to handle transpositions and/or lookahead search efficiently.

If you want to retain as much of the functionality of PR487 or PR700 as possibly. You could just move the certainty status to the node structure. Then the code changes collapse/simplify greatly in the getters, the only thing you have to ditch is the look-ahead or PR487.

Some other remarks. You may have a problem if a terminal node gets to be root if opponent plays into mate. So you might have to unterminal root when initializing search. Also if you want to retain a previously calculated mating sequence, you need to change the trim at head behaviour, as currently you trim if the head is terminal, basically throwing away the search result leading to the terminal condition. Then you also could check for certain draws and bounds (this is very effective for discovering draws).

@Videodr0me
Copy link
Collaborator

Videodr0me commented Mar 30, 2019

@Ttl One interesting aspect I noticed with nodes becoming terminal losses is the visits remain so if what search had been thinking is the best move leads to the opponent having a terminal win, search needs enough visits for a 2nd-best move to exceed and be picked even though we know for sure the most visited move is bad. Maybe we should pick move to play based on lower confidence bound like leela-zero/leela-zero#2290 ?

No need, you can do what PR700 does and adjust the tuple that is used for GetBestChildrenNoTemperature. That solves this issue.

Also somewhat related is these terminal loss moves probably shouldn't get any more visits, but we end up "wasting" CPU cycles adjusting the "uncertainty" for these moves.

This can only be done safely at root. As terminal losses need to be hit (even repeatedly) to inform search about the inferiority of the path. If policy guided UCT hits these nodes depite having a Q of -1 the line must be really bad and these hits help to start exploring alternatives early. At root you can safely skip these as there is no information left. I tried various schemes (in the tree, not at root), not visiting loosing childs at all, decreasing visit probability with N, decreasing visit probability with depth. The result was that not visiting at all looses elo, results of the more "refined" schemes (N, depth) were inconclusive. So i would strongly advise to only skip picking loosing moves at root and let PUCT do its thing in tree.

@roy7
Copy link

roy7 commented Mar 30, 2019

@Mardak Something I toyed with long ago was, if I decided a child needed pruned (in this case with Chess, a terminal node that is a loss) I'd remove the search and eval data from that move all the way back up the tree to overcome the "averaging" problem of MCTS. This was separate from using max_lcb to choose moves, the idea being instead of needing the 2nd move to get enough visits to outnumber the 1st move, I'd lower the # of visits to the 1st move if I pruned a big branch in the tree. (Lowering the average win rate much faster, and promoting 2nd move faster than waiting it out.)

Just food for thought if you ever want to try it.

@Videodr0me
Copy link
Collaborator

Videodr0me commented Mar 31, 2019

@Mardak Something I toyed with long ago was, if I decided a child needed pruned (in this case with Chess, a terminal node that is a loss) I'd remove the search and eval data from that move all the way back up the tree to overcome the "averaging" problem of MCTS. This was separate from using max_lcb to choose moves, the idea being instead of needing the 2nd move to get enough visits to outnumber the 1st move, I'd lower the # of visits to the 1st move if I pruned a big branch in the tree. (Lowering the average win rate much faster, and promoting 2nd move faster than waiting it out.)

Just food for thought if you ever want to try it.

Hi, I solved this issue by doing this:
(1) if a subbranch is made certain, which sometimes means a huge swing in eval, I adjust the scores in the tree as if all previous moves to this subbranch already yielded the certain result. This leads to leela changing its mind much more quickly without having to wait for the averaging operator to catch up.
(2) luckily you can do that basically for free, as you can do it on the same backprop that backpropagates the result anyways. Basically you just adjust the weight of the result and through the linear tree algebra you get a consistent tree just by using the normal backprop mechanism - no additional traversals needed, no extra nodes visited.
You can see how i did that in PR700 or PR487 by checking the section in backprop where i compare the previous q to the current q and adjust the weights:

// Certainty propagation: adjust Qs along the path as if all visits already

@Mardak
Copy link
Contributor Author

Mardak commented Mar 31, 2019

Thanks for the suggestions. Terminating parents from DoBackupUpdateSingleNode does seem to be slightly more optimal:

  1. avoids wasting at least 1 visit on the forward pass
  2. more accurate Q for PickNodeToExtend 1 batch earlier
  3. avoids accumulating terminating condition across children in the common case

The downside is that it's slightly more complex requiring a new children loop in each step of backup, but the extra loop should only trigger when a parent has a new terminal child. This conditional behavior aspect would allow backup to do other more complex changes without impacting the forward common case.

@Videodr0me
Copy link
Collaborator

Videodr0me commented Mar 31, 2019

Thanks for the suggestions. Terminating parents from DoBackupUpdateSingleNode does seem to be slightly more optimal:

1. avoids wasting at least 1 visit on the forward pass

2. more accurate Q for `PickNodeToExtend` 1 batch earlier

3. avoids accumulating terminating condition across children in the common case

The downside is that it's slightly more complex requiring a new children loop in each step of backup, but the extra loop should only trigger when a parent has a new terminal child. This conditional behavior aspect would allow backup to do other more complex changes without impacting the forward common case.

Exactly. Even if the original node is terminal you do not need to check all childs on every depth. As soon as you get to a node that does not get to be made terminal you can stop checking all childs from that node onward until backprop is finished. Thats the most optimal way and also avoids some other issues with doing it in picknode. If you move it to backprop and also make nodes a terminal draw if all childs are terminal draws then you have the simplest version i can think of.

And if you succeed in getting that merged, we can then add more features over time. Just for reference here is my findings in how much elo each part of PR487 gains:
(1) Bare bones CP, like this PR is not measurable. But - and thats the nice thing which attracted me to all this - it is at least >=0 for theoretical reasons and needs no tuning.
(2) Bounds checking and score adjusting is about +10 Elo
(3) one-ply-lookahead and 2-fold-draw in conjunction is also about+10Elo
(4) The other features of 487 and 700 do not impact elo, but reduce trolling, display mate, prefer mates over TB hits, play immediatly if mate is proven. These are more "cosmetic" in nature.

@zz4032
Copy link
Contributor

zz4032 commented Mar 31, 2019

TC 15.12s/game + 0.063s/move (~4000nodes/move)
lc0_PR822 with option.Terminator=true.

   # PLAYER       :  RATING  ERROR  LOS(%)   GAMES  DRAWS(%)
   1 lc0          :     0.0   ----    94.9    1000      71.3
   2 lc0_PR822    :    -9.1   10.9     ---    1000      71.3

White ELO advantage = 70.7
Draw rate (equal opponents) = 77.6 %

@Mardak
Copy link
Contributor Author

Mardak commented Apr 1, 2019

Added a commit to do Terminator from DoBackupUpdateSingleNode instead of PickNodeToExtend. Here's an example of the more efficient visit usage using the first position from the first comment #822 (comment):

position fen 3k4/p4Q2/3N3p/1p1B4/8/1P2P3/2R2PPP/6K1 w - - 1 31 moves f7a7

# old PickNodeToExtend behavior
h6h5 N:       1 (+ 1) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       2 (+ 0) (P: 52.32%) (Q: -0.98630) (V: -0.9726)

h6h5 N:       2 (+ 0) (P: 47.68%) (Q: -0.98746) (V: -0.9749)
b5b4 N:       3 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

h6h5 N:       3 (+ 0) (P: 47.68%) (Q: -1.00000) (V: -1.0000) (T)
b5b4 N:       4 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

# new DoBackupUpdateSingleNode behavior
h6h5 N:       1 (+ 0) (P: 47.68%) (Q: -0.97491) (V: -0.9749)
b5b4 N:       2 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

h6h5 N:       2 (+ 0) (P: 47.68%) (Q: -1.00000) (V: -1.0000) (T)
b5b4 N:       3 (+ 0) (P: 52.32%) (Q: -1.00000) (V: -1.0000) (T)

Instead of taking 6 visits to identify both moves as terminal, the new behavior finds it 2 visits earlier (1 visit for each terminal).

Also, the new commit adds in draw checks. Here's a dummy test position:
Screen Shot 2019-03-31 at 6 07 36 PM

position fen qk1K4/8/8/8/8/8/8/Q7 w - - 1 1

# progression --no-terminator
a1a8 N:       1 (+ 0) (P:  5.96%) (Q:  0.04149) (V:  0.0415)
a1a8 N:       2 (+ 0) (P:  5.96%) (Q:  0.02074) (V:  0.0415)
a1a8 N:       3 (+ 0) (P:  5.96%) (Q:  0.01383) (V:  0.0415)

# progression --terminator
a1a8 N:       1 (+ 0) (P:  5.96%) (Q:  0.04149) (V:  0.0415)
a1a8 N:       2 (+ 0) (P:  5.96%) (Q:  0.00000) (V:  0.0000) (T)

@Mardak
Copy link
Contributor Author

Mardak commented Apr 1, 2019

@Ttl Oh nice! I didn't notice your #817 earlier. Here's the output with PRs combined:

position startpos moves g1f3 e7e6 c2c4 g8f6 d2d4 f8b4 c1d2 b4e7 a2a3 d7d5 e2e3 e8g8 b1c3 b8d7 d1c2 b7b6 c4d5 e6d5 f1e2 c8b7 f3e5 a7a6 e1g1 c7c5 d2e1 a8c8 c2f5 d8c7 f2f4 c8d8 a1c1 f8e8 e1h4 b6b5 e2d3 d7f8 f5h3 f6e4 c3e4 d5e4 d3b1 c5c4 f4f5 e7h4 h3h4 f7f6 e5g4 f8d7 g4f2 d7b6 f2e4 b7e4 b1e4 e8e7 c1e1 d8e8 f1f4 g8h8 e4f3 c4c3 b2c3 c7c3 h2h3 e7e3 e1e3 c3e3 g1h2 b6c4 f3d5

go nodes 700
h7h6 N:      15 (+ 0) (P:  3.92%) (Q:  0.09498) (LCB:  0.43525) (V:  0.0814)
e8d8 N:      12 (+ 0) (P:  2.80%) (Q:  0.11855) (LCB:  0.49849) (V:  0.0983)
e3a3 N:     542 (+ 1) (P: 14.29%) (Q:  0.28268) (LCB:  0.61505) (V:  0.3569)
bestmove e3a3 ponder d5f7

go nodes 1000
e3a3 N:     561 (+ 0) (P: 14.29%) (Q: -1.00000) (LCB:  0.00000) (V: -1.0000) (T)
…
h7h6 N:      74 (+ 1) (P:  3.92%) (Q:  0.13845) (LCB:  0.51435) (V:  0.0814)
e8d8 N:      93 (+ 0) (P:  2.80%) (Q:  0.15288) (LCB:  0.53667) (V:  0.0983)
bestmove e3a3 ponder d5f7

go nodes 1300
e3a3 N:     561 (+ 0) (P: 14.29%) (Q: -1.00000) (LCB:  0.00000) (V: -1.0000) (T)
…
e8d8 N:     136 (+ 1) (P:  2.80%) (Q:  0.15390) (LCB:  0.54272) (V:  0.0983)
h7h6 N:     269 (+ 0) (P:  3.92%) (Q:  0.16087) (LCB:  0.54902) (V:  0.0814)
bestmove h7h6 ponder f4f3

Looks like your PR definitely helps avoid picking known losing moves! 😄

@Mardak
Copy link
Contributor Author

Mardak commented Apr 1, 2019

I ran 1000 selfplay with 800 visits to see what's the behavior of parents getting terminated with this PR.

  • Generally, at 800 visits, the terminated nodes are far from root or weren't likely to be along the main pv; and the Q value before termination wasn't too far off.
  • A position that has a checkmate move for the opponent would likely have a Q around -0.65. MCTS probably would have preferred not visiting that parent anyway, so terminating it to Q: -1 makes it even less likely to visit.
  • Similarly positions that have all opponent moves as losses (most likely a position with a single forced move) already had a Q around 0.85, so MCTS would have favored the move anyway.
  • For the rare positions that could terminate as draw from this data, all of them were positions that had a single forced opponent draw move, and the Q was likely to be around 0.05.
  • For the extremely rare case where the pv got terminated into a loss (i.e., the game hadn't resigned already) a quick glance shows that these positions were not far from losing anyway, so additional search wouldn't have found a better move.

Here's some stats of terminated parents:

  • 46337 loss terminals
    • 0 with 1 edge
    • 13 with 2 edges
    • 13 with 3 edges
    • 130 with 4-9 edges
    • 4353 with 10-19 edges
    • 12125 with 20-29 edges
    • 16679 with 30-39 edges
    • 10980 with 40-49 edges
    • 2009 with 50-59 edges
    • 35 with 60-69 edges
    • 0 with 70+ edges
  • 5860 won terminals
    • 5522 with 1 edge
    • 328 with 2 edges
    • 10 with 3 edges
    • 0 with 4+ edges
  • 497 draw terminals
    • 497 with 1 edge
    • 0 with 2+ edges

So there are definitely a lot more parents that could be terminated as loss because one of generally 10+ possible moves are opponent with checkmate. There were a small portion of positions where the opponent had multiple moves leading to checkmate allowing a parent terminated as won, and overall roughly 90% fewer won than loss terminated parents. Similarly roughly 90% fewer draw than won terminated parents.

q terminated
Notice the red area does extend very far to the right and that means some parents terminated as loss thought the position was very much favorable only to realize the opponent has a checkmate move.

tournamentstatus final
P1: +443 -430 =127
Win: 50.65%
Elo:  4.52
LOS: 67.00%
P1-W: +236 -201 =63
P1-B: +207 -229 =64

@Mardak Mardak force-pushed the terminate-search branch 2 times, most recently from 99e7927 to bccfd10 Compare April 1, 2019 20:32
@Mardak Mardak changed the title Terminate search if any move is a win or all moves are loss. Also make parents Terminal if any move is a win or all moves are loss or draw. Apr 1, 2019
@Videodr0me
Copy link
Collaborator

Videodr0me commented Apr 2, 2019

I checked the code (briefly) and everything looks ok now. I really think you cannot do a simpler version of this. So lets keep our fingers crossed that this gets commited, so we have a starting point to maybe build on top of this.

Your stats match mine, and are the reason why i push for two-fold draw scoring. Certain draws will then increase substantially in endgame positions (and the lookahead will increase those even further at no gpu cost - just cpu).

Last two things I would suggest is to call it "certainty-propagation" or maybe even more accurately "terminal-propagation". An also you should maybe not trim the head completely - because currently as soon as a terminal becomes root (start of the mating sequence) all previous computation is lost because the tree gets deleted. This might lead to strange sudden death behavior. You could just change the call to trimathead to only occur if the terminal node has no children and unterminal root.

@Mardak
Copy link
Contributor Author

Mardak commented Apr 2, 2019

There's a set of issues related to trimming the head or not that would complicate this PR, so not sure if it's worth delaying getting things merged vs cleaning it up in its own PR. Similarly naming discussions often lead to time wasted bikeshedding, so I usually just let the reviewer decide if we even need to have a option/param.

@Mardak Mardak force-pushed the terminate-search branch 2 times, most recently from 76092ac to 7cfa496 Compare April 2, 2019 23:08
@gonzalezjo
Copy link

Probably a stupid question: What are the functional differences between this and PR800 now?

src/mcts/params.cc Outdated Show resolved Hide resolved
src/mcts/node.cc Outdated Show resolved Hide resolved
src/mcts/search.h Outdated Show resolved Hide resolved
src/mcts/search.cc Outdated Show resolved Hide resolved
src/mcts/search.cc Outdated Show resolved Hide resolved
src/mcts/search.cc Outdated Show resolved Hide resolved
src/mcts/search.cc Show resolved Hide resolved
@killerducky
Copy link
Contributor

@Mardak do you know why this test showed an Elo regression?
#822 (comment)

Is the code slower? Or was there a problem that has been fixed since then?

@Mardak
Copy link
Contributor Author

Mardak commented Apr 16, 2019

The old code was a bit inefficient in the common case affecting all node picking vs the current code only doing extra work when a new terminal was visited. There is also a fix in the current code to retain existing tree if the game played into a "terminal" position where the old code would have required re-searching already visited nodes.

// A non-winning terminal move needs all other moves to have the same value.
if (can_convert && v != 1.0f) {
for (const auto& edge : p->Edges()) {
can_convert = can_convert && edge.IsTerminal() && edge.GetQ(0.0f) == v;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic can be extended so that if all nodes are terminal, but they don't all agree on v - then parent can be made terminal draw. But theoretically assuming imperfect opponent play maybe it could be better not to do so.

Copy link
Collaborator

@Videodr0me Videodr0me Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if all childs are terminal (or certain in my parlance) the maximum should be returned, so if all moves are (certain) losses and one is a (certain) draw, we can safely consider the position a certain draw. The perfect play assumption is not really relevant, as all opponents of leela either share the same codebase (different NNs) or are much superior in propagating minimax scores (basically all a/b engines). I would suggest to implement this - its simple and should maybe only add one line of code, but this could be done in further PRs after this is in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be extended in a separate PR to improve the logic.

@Videodr0me
Copy link
Collaborator

I still find the name of the option a little misleading, as mates in the opening or middle gamephase are also affected. I would still suggest calling it "terminal-propagation". Also in its current simple form it can be an always on feature so maybe we don't need an option after all.

@Videodr0me
Copy link
Collaborator

I saw that you avoid trimming the head now and recalculate N. But i vaguely remember that I also had to change makemove, as otherwise there are problems in self-play and treesharing - did you test that?

@Mardak
Copy link
Contributor Author

Mardak commented Apr 17, 2019

i vaguely remember that I also had to change makemove, as otherwise there are problems in self-play and treesharing - did you test that?

It does look like the root can end up terminal if selfplay is run with --reuse-tree but the default is off, and it seems likely it will stay that way for training unless other things get fixed first, e.g., applying noise to existing nodes. I'll add a note to #775 if it becomes a more general issue of fixing tree-reuse issues.

@Videodr0me
Copy link
Collaborator

It does look like the root can end up terminal if selfplay is run with --reuse-tree but the default is off, and it seems likely it will stay that way for training unless other things get fixed first, e.g., applying noise to existing nodes. I'll add a note to #775 if it becomes a more general issue of fixing tree-reuse issues.

Things should not break if people use different options. The change required to fix this is basically identical to what you already did in Resettoposition, you just have to add the same lines of code to makemove. You can check PR700 for reference.

@Mardak
Copy link
Contributor Author

Mardak commented Apr 18, 2019

But how does it break? The outcome of the games are the same.

@Mardak
Copy link
Contributor Author

Mardak commented Apr 18, 2019

The change required to fix this is basically identical to what you already did in Resettoposition, you just have to add the same lines of code to makemove.

Copy/pasting code is not a good solution in general as it often leads to bugs and code maintenance issues down the line that need additional review cycles or mental overhead causing.

I can file a followup PR with a better fix than just duplicating code.

@Videodr0me
Copy link
Collaborator

But how does it break? The outcome of the games are the same.

There might be several issues, games might not be played to their conclusion according to the rules of chess, terminating early before mate or a real drawn terminal is actually on the board and i think there might even be some edge cases where it can crash (but this might or might not apply to your current implementation).

@Videodr0me
Copy link
Collaborator

Copy/pasting code is not a good solution in general as it often leads to bugs and code maintenance issues down the line that need additional review cycles or mental overhead causing.

Obviously it was not meant as suggesting to cut/paste the code verbatim. It was just meant as a pointer on what could to be done in makemove to avoid potential problems - naturally it needs to be adapted. The aim was to raise awareness of potential issues - so we can get this into master without breaking anything.

@Mardak
Copy link
Contributor Author

Mardak commented Apr 19, 2019

r?@mooskagh or @Tilps Could you additionally review the added commit f39a874 that moves the original terminal trimming head behavior into MakeMove as a conditional MakeNotTerminal()? This proactively fixes selfplay with tree reuse if TBs were supported in selfplay too.

@Videodr0me
Copy link
Collaborator

Videodr0me commented Apr 20, 2019

Probably a stupid question: What are the functional differences between this and PR800 now?

PR800 is based on PR700 (basic certainty propagation). The differences to this PR are:

  • PR700/800 display mate scores, this PR does not.
  • PR700/800 prefer terminal wins over certain wins, this PR does not differentiate between the two. This leads to situations where this PR will postpone a mate-in-one just before the 50 move counter runs out.
  • PR700/800 differentiates between mates proven with and without TB hits, this PR does not. This leads to situations where this PR might go for a long TB win instead of a shorter direct mate.
  • PR700/800 play winning moves and avoid losing moves regardless of visits. This PR might not play a proven win if its visits did not catch up to the current best move in time or it might play a loosing move if alternatives did not overtake it in visits in time.
  • PR700/800 establishes proper minimax scores, this PR does not. So cases where all moves, except one drawing move are losing, are not propagated upward making this PR slightly less effective.
  • PR700/800 establish certain bounds. For example if one side can "force" a draw but has alternatives, the position is scored as a lower bound draw. This is very effective in properly scoring repeatitions and is more effective than this PR.
  • PR700/800 have an option to enable "two-fold" draw scoring, which increases the number of certain nodes and cuts down on tree combinatorics also making it more effective than this PR.
  • PR700/800 store information in the edges, while this PR overloads the terminal bool in nodes. This makes this PR fit the current codebase more elegantly and it uses less memory.

But all of this is inconsequential, as many of the strength gaining features can be added to this PR later once its in master over time. You can think of PR487/700/800 as a proof-of-concept of what is possible if you push this concept to its limits. In doing that i imposed - maybe to much - of my vision on how the codebase should evolve, and generally underestimated that the collaborative approach is better served with implementing features in small increments.And this is what this PR is about.

@Tilps
Copy link
Contributor

Tilps commented Apr 22, 2019

r?@mooskagh or @Tilps Could you additionally review the added commit f39a874 that moves the original terminal trimming head behavior into MakeMove as a conditional MakeNotTerminal()? This proactively fixes selfplay with tree reuse if TBs were supported in selfplay too.

I wondered whether MakeNotTerminal should be kept in case MakeMove isn't called - but I guess its fine since the making of terminals never applies to the root - so even if we start from fen which is late game and able to be proven terminal, the root itself doesn't get marked and so does not need to be reset.
So I think this is fine.

src/mcts/node.cc Outdated
const auto n = child.GetN();
if (n > 0) {
n_ += n;
q_ -= child.GetQ(0.0f) * n;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to add a comment why it is -= here instead of +=.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would += -child.GetQ() instead be more immediately clear without a comment?

Copy link
Contributor Author

@Mardak Mardak Apr 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to += and added comment:

lc0/src/mcts/node.cc

Lines 244 to 245 in 9b76ded

// Flip Q for opponent.
q_ += -child.GetQ(0.0f) * n;

src/mcts/node.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@Tilps Tilps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants