Dream Go is an independent implementation of the algorithms and concepts presented by DeepMind in their Master the Game of Go without Human Knowledge paper with a few modifications to (maybe) make it feasible to develop a strong player without access to a supercomputer on the scale of Sunway TaihuLight.
- Human games are used to bootstrap the network weights.
- Additional (synthetic) features inspired by AlphaGo and DeepForest are used during training and inference.
- A self learning procedure inspired by Thinking Fast and Slow with Deep Learning and Tree Search is used.
- CUDAv11 and cuDNNv8 (or higher)
- NVIDIA GPU (Compute Capability 6.1 or higher)
If you want to run the supervised or reinforcement learning programs to improve the quality of the weights or help development of the agent then you will need the following:
- Python 3.6 with Tensorflow
- Rust (nightly)
To bootstrap the network from pre-generated data you will need an SGF file where each line contains a full game-tree, henceforth called big SGF files. If you do not have access to such a file you can use the tools/sgf2big.py
tool to merge all SGF files contained within a directory to a single big SGF file. You may also want to do some data cleaning and balancing (to avoid bias in the value network) by removing duplicate games and ensuring we have the same amount of wins for both black and white.
./tools/sgf2big.py data/kgs/ > kgs_big.sgf
cat kgs_big.sgf | sort | uniq | shuf | ./tools/sgf2balance.py > kgs_bal.sgf
This binary file can then be feed into the bootstrap script which will tune the network weights to more accurately predict the moves played in the original SGF files. This script will automatically terminate on convergence. You can monitor the accuracy (and a bunch of other stuff) using Tensorboard, whose logs are stored in the logs/
directory. The final output will also be stored in the models/
directory.
cd contrib/trainer
python -m dream_tf --start kgs_big.sgf
tensorboard --logdir models/
When you are done training your network you need to transcode the weights from Tensorflow protobufs into a format that can be read by Dream Go, this can be accomplished using the --dump
command of the bootstrap script:
python -m dream_tf --dump > dream-go.json
Two reinforcement learning algorithms are supported by Dream Go. They differ only marginally in implementation but have vastly different hardware requirements. Which of the two algorithms is the best is currently unknown, but I would recommend Expect Iteration because you most likely do not have the hardware requirements to run the AlphaZero algorithm:
If you want to use the AlphaZero algorithm then you need to start by generating self-play games. The self-play games generated by Dream Go are different from normal games played using the GTP interface in several ways, most notably they are more random (to encourage exploration, and avoid duplicate games), and a summary of the monte-carlo search tree is stored for each position. This monte-carlo summary is then used during training to expose a richer structure to the neural network.
This can be accomplished using the --self-play
command-line option. I also recommend that you increase the --num-threads
and --batch-size
arguments for this since the defaults are tuned for the GTP interface which has different (real time) requirements. This program will generate 25,000 games (should take around 14 days on modern hardware):
./dream_go --num-threads 32 --batch-size 32 --self-play 25000 > self_play.sgf
The network should now be re-trained using this self-play, this is done in the same way as during the supervised training by first performing some basic data cleaning to avoid bias, converting the games to a binary representation and then training the network using TensorFlow. You should have at least 150,000 games in total to acquire a good result:
sort < self_play.sgf | uniq | shuf | ./tools/sgf2balance.py > self_play_bal.sgf
cd contrib/trainer/ && python3 -m dream_tf --start self_play_bal.sgf
The training procedure for Expert Iteration is almost the same as for AlphaZero with two exceptions:
- We generate games with
--num-rollout 1
and--ex-it
. These are self-play games without any search, so they are about 800 to 1,600 times faster to generate, but of lower quality. - We generate the monte-carlo search tree during data extraction using the
--ex-it
switch only for examples that actually end-up as examples for the neural network.
./dream_go --num-games 32 --num-threads 32 --batch-size 32 --num-rollout 1 --ex-it --self-play 200000 > policy_play.sgf
sort < policy_play.sgf | uniq | shuf | ./tools/sgf2balance.py > policy_play_bal.sgf
cd contrib/trainer/ && python3 -m dream_tf --start policy_play_bal.sgf
For the values provided in this example, which generate 200,000 examples for the neural network it should take about 1 days to generate the required data (from 200,000 distinct games).
- 1.0.0 - Public Release
- 0.7.0 - Acceptance
- First version with a network trained from self-play games
- 0.6.3 - Unravel
- The engines plays more enjoyable with
kgs-genmove_cleanup
- Bug fixes
- The engines plays more enjoyable with
- 0.6.2 - Unfolded
- Improved training procedure.
- Change the input features to include more liberties.
- Decrease memory use by 80%, and runtime performance by 25%.
- Improved performance with Tensor Cores.
- 0.6.1 - Emerged
- Improved neural network architecture
- Improved reinforcement training environment
- 0.6.0 - Emergent
- Time and tournament commands for the GTP interface
- Improved neural network training
- Improved performance with DP4A
- Multi GPU support
- 0.5.0 - Assessment
- Optimize the monte carlo tree search parameters against other engines
- Optimize neural network size for best performance vs speed ratio
- 0.4.0 - Awakening
- GTP interface
- 0.3.0 - Slow-wave sleep
- Monte carlo tree search for self-play
- 0.2.0 - Light Sleep
- Self-play agent without monte carlo tree search
- Reinforcement learning using self-play games
- 0.1.0 - Napping
- Supervised learning using a pre-existing dataset