Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Play move with the highest lower confidence bound #817

Closed
wants to merge 5 commits into from

Conversation

Ttl
Copy link
Member

@Ttl Ttl commented Mar 27, 2019

Same as leela-zero/leela-zero#2290. Some features still missing from the LZ PR: minimum visit ratio and avoiding pruning of the best LCB move in time management.

800 node test with network 41724 (MinibatchSize=32 and 5 piece TB)
Score of lc0_lcb vs lc0_master: 85 - 62 - 253  [0.529] 400
Elo difference: 20.00 +/- 20.61, LOS: 97.11 %, DrawRatio: 63.2 %

@Ttl Ttl added the wip Work in progress label Mar 27, 2019
@killerducky
Copy link
Contributor

I think you'll need something similar to this in meson.build:

deps += [ dependency('boost', modules : ['thread', 'system' ], static : true) ]

@Ttl
Copy link
Member Author

Ttl commented Mar 27, 2019

Oh, boost isn't used anywhere. Somehow it works on my machine. Probably no point in adding it just for this one function call.

Ttl added 2 commits March 27, 2019 19:58
TODO: Fix sorting in VerboseStats to use ratio.
Optimum based on clop at 800 nodes.
@Ttl
Copy link
Member Author

Ttl commented Mar 28, 2019

I clopped ci_alpha and min_ratio parameters at 800 nodes with T50 network against master branch:

ci_alpha_clop_max

ci_alpha_clop_plot

X-axis: GammaParameter cialpha 1e-7 1e-2
Y-axis: LinearParameter minimum-lcb-n-ratio 0.05 0.95

Copy link
Contributor

@ddobbelaere ddobbelaere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, excited to see it works so well!

src/mcts/node.cc Outdated Show resolved Hide resolved
return -1e6f + visits;
}

auto stddev = std::sqrt(GetVariance(1.0f) / visits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is the standard error (standard deviation divided by sqrt(N)). Maybe stderr is a better variable name.

Or is this the standard deviation of something in another context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the estimated distribution for the mean which has this standard deviation. Standard error seems to be correct too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that seems logical, I didn't think about it that way :).

Takes multivisit into account.
@Ttl
Copy link
Member Author

Ttl commented Apr 1, 2019

Some testing at high nodes, nothing conclusive. TC 30+0.5s. Ryzen 1700X and RTX2080. SF with 16 threads.

   # PLAYER              :  RATING  ERROR  POINTS  PLAYED   (%)  CFS(%)
   1 lc0_41724_master    :    24.7   15.8   274.0     512    54      67
   2 lc0_41724_lcb       :    17.8   15.4   269.0     512    53      96
   3 stockfish_dev       :     0.0    9.8   481.0    1024    47     ---

@borg323
Copy link
Member

borg323 commented Apr 1, 2019

Quick patch to fix appveyor builds: borg323@bad4503

@zz4032
Copy link
Contributor

zz4032 commented Apr 2, 2019

TC 4.8s/game+0.02s/move (~900 nodes/move)

   # PLAYER       :  RATING  ERROR  LOS(%)   GAMES  DRAWS(%)
   1 lc0_PR817    :    47.4   12.5   100.0    1000      60.7
   2 lc0          :     0.0   ----     ---    1000      60.7

White ELO advantage = 56.5
Draw rate (equal opponents) = 64.3 %

TC 17.04s/game+0.071s/move (~6000 nodes/move)

   # PLAYER       :  RATING  ERROR  LOS(%)   GAMES  DRAWS(%)
   1 lc0          :     0.0   ----    60.0    1000      69.0
   2 lc0_PR817    :    -1.5   11.2     ---    1000      69.0

White ELO advantage = 68.1
Draw rate (equal opponents) = 73.9 %

@zz4032
Copy link
Contributor

zz4032 commented Sep 15, 2019

Tried retuning this PR and noticed that at lower nodes/move I'm getting a weighted plot similar to yours:
stc_8 9E-4_0 30

But with increased time control (~12000 nodes/move) the optimum area shifts and split's in two:
ltc_4 2E-6_0 51

Clearly a reason why it doesn't perform as expected at LTC.

@Naphthalin
Copy link
Contributor

As far as I know, this is still relevant, and not solved by time manager or so. Might be a candidate for @kiudee tuning it once updated, and merged if useful.

@Naphthalin
Copy link
Contributor

Closing this for a combination of several reasons:

  1. implementation requires boost
  2. too many search conflicts
  3. Analyse Mode + minimax/mcts hybrid #1543 contains a more recent LCB implementation
  4. implementation of Extracting parts of Lc0's search into classes would help future development #1734 will allow a much easier reimplementation, compared to updating this branch.

@Naphthalin Naphthalin closed this Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wip Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants