Skip to content

Releases: lightvector/KataGo

Distributed Client Bugfixes, Friendly Passing, Engine Bugfixes

14 Mar 20:54
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!

KataGo is continuing to improve at https://katagotraining.org/ and if you'd like to donate your spare GPU cycles and support it, it could use your help there! And development is still active! Coming down the pipe in the future, although not in time for this release, there are some search improvements being worked on, which should provide a major strength boost.

If you don't know which version to choose (OpenCL, CUDA, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-eigen

Distributed Client Improvements (for contributing to the public run at katagotraining.org)

  • Improved handling of errors and network connection issues, including fixing at least two bugs that could cause endless hangs or deadlocks.

  • Distributed client now will attempt to stop gracefully on the first interrupt (e.g. ctrl-c), finishing its current remaining games. A second interrupt will force a stop more immediately.

  • Distributed client now supports https_proxy environment variable for users using a proxy to connect to the internet.

  • The OpenCL version of the client will now tune all necessary model sizes up-front upon startup the first time, which should be more accurate and less-disruptive than tuning them in the middle when a new model of that size is downloaded. This may take a little while, so please be patient.

Engine Improvements

  • Friendly passing - when KataGo is set to named rules other than tromp-taylor, if the opponent has just passed, it will more often pass in response without attempting to do much cleanup of dead stones, so long as there are no gainful moves left.

    • When NOT specifying rules by name (like japanese or aga), but rather by individual options (koRule, scoringRule,...), the new individual option controlling this behavior is friendlyPassOk = true or friendlyPassOk = false (default).
  • KataGo now supports Fischer time controls, along with new GTP extensions kata-time_settings and kata-list_time_settings documented here with hopefully rigorous-enough semantics to support future time control additions and be implementable by any other engine that wants to mimic the same spec.

  • GTP can now change the number of search threads at runtime, via kata-set-param, documented here.

  • GTP final-score and related commands should also now behave in accordance with friendly passing, reporting the Tromp-Taylor score after two passes in tromp-taylor rules, and the estimated human-friendly score after dead stone removal in other rules.

  • Fixed GTP bug where certain commands relating to scoring or ending the game (e.g. final_status_list) might silently alter settings like playoutDoublingAdvantage for that run of GTP.

  • Various minor improved logging and error messages in multiple top-level commands and utilities.

  • Fixed issue where using a large number of threads could sometimes make GTP final score estimation inaccurate, by capping the number of threads.

Dev-facing Improvements and Internal Changes

  • The analysis engine now reports two new fields thisHash and symHash for the root info, which are Zobrist hashes of the situation being analyzed that can be used to identify or distinguish positions and their symmetrical partners.

  • Fixed a bug in the vartime auxiliary training target where the loss wasn't being weighted correctly, causing major biases in the neural net's learning of this target.

  • Several internal cleanups and refactors of some dev tools for sampling and mining SGFs and running test positions, added some command-line arguments for filtering SGFs based on some criteria.

  • Added some logic to handle parsing of the oddly-formatted komi and ranks for Fox server SGFs.

  • Updated to a version newer version of a json-parsing library to fix an issue where on specific versions of MSVC, the older json library would cause compile errors.

New Nets, Stronger Search, Distributed Client, Many Bugfixes

14 Jan 05:24
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!

KataGo has started a new distributed run at https://katagotraining.org/, and this release newly supports the latest and strongest neural nets from there! Also if you wish to contribute, the run will be open for full public contributions soon, but for now, you can already try the new nets. For nets from older runs, see https://d3dndmfyhecmj0.cloudfront.net/index.html.

If you don't know which version to choose (OpenCL, CUDA, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-eigen

Major Engine Changes and Fixes

  • Now supports the new neural nets at https://katagotraining.org/, which have an altered format and some new output heads that might be used to improve the search logic in future versions.

  • A lot of internal changes with hopefully all the critical changes needed to support public contributions for the distributed run opening shortly, as well as many bugfixes, and stronger search logic.

  • New subtree value bias correction method has been added to the search, which should be worth somewhere between 20 and 50 Elo for mid-thousands of playouts.

  • Fixed a bug in LCB move selection that prevented LCB from acting on the top-policy move. The fix is worth perhaps around 10 Elo.

  • Time control logic has been greatly overhauled and reimplemented. Most of its features are not enabled by default due to uncertainty on the best parameters, they may be set to reasonable defaults after more testing in the future. (Anyone interested in running tests or collaborating further logic tweaks would be welcome!)

  • Bugfix to Japanese-like rules that should allow for more accurate handling of double-ko-death situations. Will also require new nets to gradually adjust to these rules, which may take some more time with the ongoing new run.

  • Root symmetry sampling now samples without replacement instead of with replacement, and is capped at 8, the total number of possible symmetries, instead of 16.

Minor Engine Changes and Fixes

  • Removed old no-longer-useful search parameter fpuUseParentAverage.

  • Built-in katago match tool's komiAuto feature now uses 100 visits per test instead of 20 by default to find a fair komi.

  • Built-in katago match tool now has some logic to avoid prematurely-early resignation, to be consistent with GTP.

  • Fixed a segfault that could happen during config generation in katago genconfig command.

  • Fixed bug where analysis engine could sometimes report the rootInfo with the wrong side's perspective.

  • Fixed bug where priorities outside [-2^31, 2^31-1] would not work properly in the analysis engine.

  • Fixed GTP command kata-raw-nn to also report the policy for passing.

Self-play and Training Changes

  • Neural net model version 10 is now the default version, which adds a few new training targets and rebalances all of the weights of the loss function. Training and loss function statistics may not be directly comparable to those of earlier versions.

  • Going forward, newly-created neural nets with the KataGo python scripts will default to using a 3x3 conv instead of a 5x5 conv for the first layer. This may result in newly-trained neural nets being very slightly weaker and lower-capacity, and very slightly faster than old nets. This also greatly reduces memory usage on bigger nets with OpenCL. Existing nets will be unaffected (even if v1.8 is used train them).

  • Fixed bug where hintposes were not adjusted for the initial turn number of the position.

  • Some SGF startposes file handling is improved to allow deeper-branching files to be handled without running out of stack space.

  • Fixed bug where a stale root nn policy might suppress a hintpos from taking effect. Hintposes will also do more full searches instead of cheap searches in the few moves after the hint.

  • Improved logging of debug output from self-play training, improved SGF file comments for selfplay games, various internal cleanups.

  • Training script now has option to lock the ratio of train steps vs data samples.

  • Easier usage of initial weights for training - train script will look for any tensorflow checkpoints and meta files within a directory named "initial_weights" that is a subdirectory of that specific net's training directory.

  • Deleted some of the old unused model code.

  • Just for fun, added some pytorch genboard scripts that train a neural net to generate plausible board positions given some existing stones on that board.

CUDA 11, Analysis Engine Features, Prepare for Distributed

09 Nov 05:14
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!
The latest and strongest neural nets are still those from the former release: https://github.com/lightvector/KataGo/releases/tag/v1.4.5
If you don't know which version to choose (OpenCL, CUDA, Eigen, Eigen AVX2), read this: https://github.com/lightvector/KataGo#opencl-vs-cuda-vs-eigen

This release contains a variety of minor bugfixes and minor feature additions. It also incorporates a large number of internal changes to prepare for and support a distributed training run (yay), although distributed training support has deliberately not been enabled yet for the precompiled executables this release.

General Improvements and Features

  • Supports CUDA 11.1 now, which makes it possible to use KataGo CUDA instead of only OpenCL with NVIDIA RTX 30** GPUs. Beware though that on other GPUs CUDA 11.1 might not actually be faster than 10.2 - in one test on a V100 cloud machine, CUDA 11.1 seemed to be slower than CUDA 10.2. And possible changes to OpenCL speed and to CUDA speed on RTX 30** are also unknown and seem to vary - some users have reported exciting results, some have reported fairly disappointing ones.

  • Added new gtp config option "ignoreGTPAndForceKomi" that will force a particular komi regardless if the GTP controller tries to specify a different one. And KataGo is also now slightly smarter about guessing default komi based on other rules in the case where absolutely nothing tells KataGo what it should be.

  • KataGo no longer requires boost libraries in order to be compiled.

  • OpenCL backend optimized to now require less GPU memory.

  • Benchmark command should now be more efficient about choosing search ranges for threads.

Analysis Engine

There are several improvements to the json analysis engine.

  • Can now report the predicted ownership map for each individual move.

  • Can now report results from an ongoing query, making it possible to do the same things you would with kata-analyze or lz-analyze.

  • Can now cancel or terminate queries before they finish.

  • Can now specify differing per-turn priorities in a single query.

  • Supports priorities outside the range +/- 2^31, making it easier to do priorities based on timestamps or externally-determined large id numbers, or very, very long-running processes.

Bugfixes

  • Fixes a coding error that would make it sometimes impossible for KataGo to select the optimal move near the end of a game with button Go rules. (Button Go is a ruleset that KataGo supports that has the rules-simplicity and elegance of area scoring, but with the sharper and fairer scoring granularity of territory scoring).

  • Fix minor parsing bug on some uses of -override-config

  • Fixed some bugs on how the benchmark command behaved with threads for the Eigen backend.

Other Changes

  • Shuffle script for selfplay training, which long ago dropped support for shuffling training and validation data separately, now also uses a filepath that just shuffles all data together.

  • A large number of internal refactors and changes have been made to support acting as a client for distributed training. The cmake option BUILD_DISTRIBUTED=1 will make KataGo compile with support for distributed training, although the official distributed run has not quite started yet.

Eigen Memory Bugfixes

25 Aug 22:54
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!
The latest and strongest neural nets are still those from the former release: https://github.com/lightvector/KataGo/releases/tag/v1.4.5

A lot of recent and interesting release notes can still be found at this prior release https://github.com/lightvector/KataGo/releases/tag/v1.6.0, including basic information about the new KataGo CPU backend (eigen vs eigenavx2), and a couple of significant new features for analysis tool developers!

Changes

This is a followup release that fixes some issues with the new Eigen (CPU) implementation in 1.6.0:

  • Fixed two issues that caused Eigen implementation to use massively more memory than it needed, particularly when run with many threads (which could exhaust all RAM on some systems).
  • Better default settings of the numbers of threads to use in Eigen, which are now overrideable by a new separate config parameter if needed (but users should just stick to the default anyways).

And some minor other changes:

  • For the analysis engine, the number of positions to search in parallel is now controlled by numAnalysisThreads in the config instead of a command line argument (but the command line argument still works, for backwards compatibility).
  • The analysis engine config now allows specifying numSearchThreadsPerAnalysisThread as an alias for numSearchThreads. This is not new behavior, this is just an alias whose name hopefully conveys the effect of this parameter better.

Bigger board sizes

25 Aug 23:07
Compare
Choose a tag to compare
Bigger board sizes Pre-release
Pre-release

This is a non-regular side-release just for fun with precompiled executables for board sizes up to 29x29. However they will use more RAM and possibly be a little slower even when used to play 19x19 or smaller boards. So for best performance, one should still prefer the normal release.

The actual latest release (mostly bugfixes) is here: https://github.com/lightvector/KataGo/releases/tag/v1.6.1
And the latest major release with many release notes is here: https://github.com/lightvector/KataGo/releases/tag/v1.6.0
And the latest and strongest neural nets are still those from here: https://github.com/lightvector/KataGo/releases/tag/v1.4.5

Eigen (CPU) version and Other Improvements

23 Aug 19:42
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!
The latest and strongest neural nets are still those from the former release: https://github.com/lightvector/KataGo/releases/tag/v1.4.5

KataGo has now improved its Eigen implementation, making for what is now a reasonably decently-optimized pure-CPU version! It will of course still be much slower than with a good GPU, but particularly for smaller nets (20 blocks, or 15 blocks) should often get from 5 to 20 playouts per second. All of these versions are available as pre-compiled executables in this release now.

Versions available

  • OpenCL - Use this if you have a modern GPU.
    This continues to be the general GPU version of KataGo, should work on a variety of GPUs, although older GPUs not from the last few years may not work, and AMD and minor vendors often have driver issues in their OpenCL implementations.

  • CUDA - Test this if have a top-end NVIDIA GPU, are willing to do some more technical setup work, and care about getting every bit of performance.
    Requires an NVIDIA GPU and requires installing CUDA 10.2 (not CUDA 11 yet) and CUDNN from NVIDIA. For most users, there is little reason to use this version, often the OpenCL version will be faster even on NVIDIA's own GPUs! The CUDA version may be a little faster for some very top-end GPUs that have FP16 tensor cores. But even then not always, so you should benchmark the difference to see in practice on your specific hardware.

  • Eigen AVX2 - Use this if you don't have a GPU or your GPU is too old to work, but you have an Intel or AMD CPU from the last several years.
    This is a pure CPU version of KataGo, but compiled to use AVX2 and FMA operations, which roughly will double the speed compared to not using them. However, it will completely fail to run on older or weaker CPUs that don't support these operations.

  • Eigen - Use this if you don't have a GPU or your GPU is too old to work, and your CPU turns out not to support AVX2 or FMA.
    This is the pure CPU version of KataGo, with no special instructions, which should hopefully run just about anywhere.

Major New Stuff This Release:

Performance

  • Massive optimizations for the Eigen implementation thanks to kaorahi, making it now usable.
  • Reduced OpenCL code overhead, which may make it able to run on a small number of older GPUs where it couldn't before.
  • Worked around major OpenCL issue with NVIDIA GPUs that prevented it from using more than one GPU effectively. Now it should scale on a multi-GPU machine, whereas previously it didn't at all.

For Analysis Tool Devs

  • Implemented allow and avoid options for both the json analysis engine and for GTP lz-analyze and kata-analyze, whose precise semantics should be documented in these links. These options allow restricting the search down to only specific moves or specific regions of the board. I'm not entirely sure if they match Leela Zero's semantics, since I could not find any precise specification for them beyond the raw source code and scattered descriptions in github issues.

  • Added pvVisits option for both the json analysis engine and for GTP lz-analyze and kata-analyze. This option causes KataGo to also report the number of visits for every move in any of the principal variations for different moves. These values might be useful for estimating or for informing users about the reliability of the moves as you get deeper into a variation.

  • Improved some of the logging options available for the analysis engine. The new options are in https://github.com/lightvector/KataGo/blob/master/cpp/configs/analysis_example.cfg, commented out with their default values.

Other

  • Fixed some interface-related bugs and made a variety of changes in the build config to produce more friendly messages and hints in CMakeGUI.

OpenCL FP16 Tensor Core Support

02 Aug 21:02
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!
The latest and strongest neural nets are still those from the former release: https://github.com/lightvector/KataGo/releases/tag/v1.4.5

Changes in this release:

OpenCL FP16 Tensor Cores

New in this release is support for FP16 tensor core GPUs in OpenCL, roughly doubling performance. Theoretically, non-tensor core GPUs that gain significant improvements via FP16 storage or compute may also see a benefit under this release. If you are upgrading from an earlier version of KataGo, the OpenCL tuner will need to re-run to re-tune itself.

The OpenCL FP16 implementation is still a little slower than the CUDA implementation on an FP16 tensor core GPU, so if you've gone through the hassle of installing CUDA and getting it to work on such a GPU, there is not a reason to switch to OpenCL, but now for users who can get OpenCL but not CUDA+CUDNN to work, the gap should be much smaller than before. Further optimization may be possible in the future, any GPU code experts are of course welcome to comment. :)

Other user-facing changes

  • New GTP extension command: set_position which allows a GTP controller to directly set an arbitrary position on the board, rather than hacking it via a series of "play" commands which might accidentally communicate an absurd move history. See documentation for KataGo GTP extensions here as usual.
  • By default, if absolutely no limits or time settings are specified for KataGo, and the GUI or tournament controller running it does not specify a time control either, KataGo will choose a small default of several seconds rather than treating time as unbounded.
  • Added a minor bit of logic for handling mirror Go. Nothing particularly robust or special, won't solve extreme cases, but hopefully fun.
  • Minor adjustments for detecting handicap stones for the purpose of computing PDA and/or when to resign.
  • Benchmark auto-tuning for number of threads is a little more efficient

Self-play

  • Hash-like game ID is now written to selfplay-generated SGFs.
  • Fixes a very rare bug in self-play game forking and initialization that could cause incorrect resolution of move legality as well as apparent neural net hash collisions upon the transition to cleanup phase for Japanese-like territory scoring rules.

Internal

  • Symmetries are now computed on the CPU rather than the GPU, simplifying GPU code a little.
  • A few internal performance optimizations and cleanups, partly thanks to some contributors.

Pure CPU implementation

Also as of this release, there is a pure-CPU implementation which can be compiled via -DUSE_BACKEND=EIGEN for cmake. There are no precompiled executables for it right now because the implementation is very basic and the performance is extremely poor - even worse than one would expect from CPU. So practically speaking, it's not ready for use. However, it's a start, hopefully, and contributors who want to help optimize it would be welcome. :)

Final June 2020 Neural Nets, Minor Bugfixes

21 Jun 17:08
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!

KataGo's third major run is complete! Almost certainly we could keep going further and continue improving with no end in sight, but for now due to the cost of continuing the run, this seems like a good point to stop. In the future, there is a chance that KataGo will launch a crowdsourced community-distributed run to continue further with more research and improvement. But regardless, I hope you enjoy the run so far and these final networks.

This is both a release of the final networks, as well as an update to KataGo's code for a few minor bugfixes.

New/Final Neural Networks (for now!)

These are the final neural networks for the June 2020 ("g170") run, obtained after training for close to 2 weeks at reduced learning rates. This resulted in a huge strength boost, somewhere from 200 to 250 Elo for both the 30 and 40 block networks, and around 100 Elo for the 20 block network.

These gains were measured by play in a pool of older KataGo networks - it's unknown what proportion of these gains transfer to opponents other than just KataGo, gains due to learning rate drops (presumably just reducing noise and gaining in overall accuracy) might be qualitatively different than gains over time from learning new shapes and moves. But hopefully much of it does.

  • g170-b30c320x2-s4824661760-d1229536699 ("g170 30 block d1229M") - Final 30 block network!
  • g170-b40c256x2-s5095420928-d1229425124 ("g170 40 block d1229M") - Final 40 block network!
  • g170e-b20c256x2-s5303129600-d1228401921 ("g170e 20 block d1228M") - Final 20 block network!

Additionally, posted here is an extremely fat and heavy neural net, 40 blocks with 384 channels instead of 256 channels, which has never been tested (scroll to the bottom, download and unzip the file to find the .bin.gz file).

It is probably quite slow to run and likely weaker given equal compute time. But it would be very interesting to try and see how its per-playout strength compares, as well as its one-playout strength (pure raw policy) in case anyone wants to test it out!

Which Network Should I Use?

  • For weaker or mid-range GPUs, try the final 20-block network.
  • For top-tier GPUs and/or for the highest-quality analysis if you're going to use many thousands and thousands of playouts and long thinking times, try the final 40-block network, which is more costly to run but should be the strongest and best.
  • If you care a lot about theoretical purity - no outside data, bot learns strictly on its own - use the 20 or 40 block nets from this release, which are pure in this way and still much stronger than Leela Zero, but also not quite as strong as these final nets here.
  • If you want some nets that are much faster to run, and each with their own interesting style of play due to their unique stages of learning, try any of the "b10c128" or "b15c192" Extended Training Nets here which are 10 block and 15 block networks from earlier in the run that are much weaker but still pro-level-and-beyond.
  • And if you want to see how a super ultra large/slow network performs that nobody has tested until now, try the fat 40-block 384 channel network mentioned a little up above.

Bugfixes this Release

  • Fixed a bug in analysis_example.cfg where nnMaxBatchSize was duplicated, and added a safeguard in KataGo to fail if fed any config with duplicate parameters in the future, instead of silently using one of them and ignoring the other.
    • If you have a config with a buggy duplicate parameter, you may find KataGo failing when switching to this release - please just remove the duplicate parameter and set it to what it should be if the two values for that parameter were inconsistent/conflicting.
  • Split up one of the OpenCL kernels into a few pieces to make compiling it faster, and also made a minor tweak, so that on most systems the OpenCL tuner will take a little less long.
  • katago match will now size the neural net according to the largest board size involved in the match by default, instead of always 19. This should make it faster to run test games on small boards.

Various bugfixes, CUDA 10.2

14 Jun 08:01
Compare
Choose a tag to compare

If you're a new user, don't forget to check out this section for getting started and basic usage!

This is a release that fixes a variety of issues and upgrades some things. See also the releases page for later releases with the latest neural nets.

This was originally released as version 1.4.3 except with a major bug for OpenCL due to a typo in the logic in implementing the new multi-device handling. This version 1.4.4 should hopefully have that bug fixed.

Enjoy!

Changes

Minor UI changes and features

  • For GTP and other commands, you can now specify an empty string for logFile to disable logging. As with almost all other config parameters, it may also be specified on the command line with -override-config.
  • For GTP and other commands, you can now specify homeDataDir=<DIR> to override the directory where KataGo will cache some data (currently, the OpenCL tuner files on the OpenCL version). As with almost all other config parameters, it may also be specified on the command line with -override-config.
  • The benchmark command now runs in -tune mode by default.

Precompiled executables

  • The precompiled CUDA executables are now compiled against CUDA 10.2 instead of 10.1.
  • The precompiled linux executables are compiled against libzip5 instead of libzip4.

Bugfixes

  • Fixed a problem that was preventing KataGo from running on multiple OpenCL GPUs or other devices if those devices were of two distinct vendors/platforms at the same time.
  • Fixed a bug where the cputime GTP command cleared on every game rather than accumulating time persistently.
  • Fixed a bug in analysis engine where reportAnalysisWinratesAs was ignored in many cases.
  • Fixed a bug where in some ways of piping input to KataGo, closing the pipe or closing stdin would be incorrectly handled and/or duplicate the final command.
  • Fixed a bug in selfplay SGF starting position processing where board size in the SGF was mistakenly ignored.
  • Fixed a typo in selfplay scripts that caused bad/confusing behavior when an incorrect number of arguments is provided.

Other cleanups

  • Various code cleanups
  • Clarified that models included directly in the repo source itself are tiny testing neural nets. See some of the releases for actual neural nets, or here for all the latest nets.

Various bugfixes, CUDA 10.2 (edit: buggy)

14 Jun 02:12
Compare
Choose a tag to compare

Edit (2020-06-14): The initial posting of this release of this was broken for OpenCL due to a typo in the logic in implementing the new multi-device handling. Reuploaded new executables what is hopefully a quick fix, and rereleased as version 1.4.4.