Dream Go - All day, every day

Dream Go is an independent implementation of the algorithms and concepts presented by DeepMind in their Master the Game of Go without Human Knowledge paper with a few modifications to (maybe) make it feasible to develop a strong player without access to a supercomputer on the scale of Sunway TaihuLight.

Human games are used to bootstrap the network weights.
Additional (synthetic) features inspired by AlphaGo and DeepForest are used during training and inference.
A self learning procedure inspired by Thinking Fast and Slow with Deep Learning and Tree Search is used.

Dependencies

CUDAv11 and cuDNNv8 (or higher)
NVIDIA GPU (Compute Capability 6.1 or higher)

Dev Dependencies

If you want to run the supervised or reinforcement learning programs to improve the quality of the weights or help development of the agent then you will need the following:

Python 3.6 with Tensorflow
Rust (nightly)

Training

To bootstrap the network from pre-generated data you will need an SGF file where each line contains a full game-tree, henceforth called big SGF files. If you do not have access to such a file you can use the tools/sgf2big.py tool to merge all SGF files contained within a directory to a single big SGF file. You may also want to do some data cleaning and balancing (to avoid bias in the value network) by removing duplicate games and ensuring we have the same amount of wins for both black and white.

./tools/sgf2big.py data/kgs/ > kgs_big.sgf

cat kgs_big.sgf | sort | uniq | shuf | ./tools/sgf2balance.py > kgs_bal.sgf

This binary file can then be feed into the bootstrap script which will tune the network weights to more accurately predict the moves played in the original SGF files. This script will automatically terminate on convergence. You can monitor the accuracy (and a bunch of other stuff) using Tensorboard, whose logs are stored in the logs/ directory. The final output will also be stored in the models/ directory.

cd contrib/trainer
python -m dream_tf --start kgs_big.sgf

tensorboard --logdir models/

When you are done training your network you need to transcode the weights from Tensorflow protobufs into a format that can be read by Dream Go, this can be accomplished using the --dump command of the bootstrap script:

python -m dream_tf --dump > dream-go.json

Reinforcement Learning

Two reinforcement learning algorithms are supported by Dream Go. They differ only marginally in implementation but have vastly different hardware requirements. Which of the two algorithms is the best is currently unknown, but I would recommend Expect Iteration because you most likely do not have the hardware requirements to run the AlphaZero algorithm:

AlphaZero

If you want to use the AlphaZero algorithm then you need to start by generating self-play games. The self-play games generated by Dream Go are different from normal games played using the GTP interface in several ways, most notably they are more random (to encourage exploration, and avoid duplicate games), and a summary of the monte-carlo search tree is stored for each position. This monte-carlo summary is then used during training to expose a richer structure to the neural network.

This can be accomplished using the --self-play command-line option. I also recommend that you increase the --num-threads and --batch-size arguments for this since the defaults are tuned for the GTP interface which has different (real time) requirements. This program will generate 25,000 games (should take around 14 days on modern hardware):

./dream_go --num-threads 32 --batch-size 32 --self-play 25000 > self_play.sgf

The network should now be re-trained using this self-play, this is done in the same way as during the supervised training by first performing some basic data cleaning to avoid bias, converting the games to a binary representation and then training the network using TensorFlow. You should have at least 150,000 games in total to acquire a good result:

sort < self_play.sgf | uniq | shuf | ./tools/sgf2balance.py > self_play_bal.sgf

cd contrib/trainer/ && python3 -m dream_tf --start self_play_bal.sgf

Expert Iteration

The training procedure for Expert Iteration is almost the same as for AlphaZero with two exceptions:

We generate games with --num-rollout 1 and --ex-it. These are self-play games without any search, so they are about 800 to 1,600 times faster to generate, but of lower quality.
We generate the monte-carlo search tree during data extraction using the --ex-it switch only for examples that actually end-up as examples for the neural network.

./dream_go --num-games 32 --num-threads 32 --batch-size 32 --num-rollout 1 --ex-it --self-play 200000 > policy_play.sgf

sort < policy_play.sgf | uniq | shuf | ./tools/sgf2balance.py > policy_play_bal.sgf

cd contrib/trainer/ && python3 -m dream_tf --start policy_play_bal.sgf

For the values provided in this example, which generate 200,000 examples for the neural network it should take about 1 days to generate the required data (from 200,000 distinct games).

Roadmap

1.0.0 - Public Release
0.7.0 - Acceptance
- First version with a network trained from self-play games
0.6.3 - Unravel
- The engines plays more enjoyable with kgs-genmove_cleanup
- Bug fixes
0.6.2 - Unfolded
- Improved training procedure.
- Change the input features to include more liberties.
- Decrease memory use by 80%, and runtime performance by 25%.
- Improved performance with Tensor Cores.
0.6.1 - Emerged
- Improved neural network architecture
- Improved reinforcement training environment
0.6.0 - Emergent
- Time and tournament commands for the GTP interface
- Improved neural network training
- Improved performance with DP4A
- Multi GPU support
0.5.0 - Assessment
- Optimize the monte carlo tree search parameters against other engines
- Optimize neural network size for best performance vs speed ratio
0.4.0 - Awakening
- GTP interface
0.3.0 - Slow-wave sleep
- Monte carlo tree search for self-play
0.2.0 - Light Sleep
- Self-play agent without monte carlo tree search
- Reinforcement learning using self-play games
0.1.0 - Napping
- Supervised learning using a pre-existing dataset

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 634 Commits
.github/workflows		.github/workflows
contrib		contrib
data		data
debian		debian
src		src
tools		tools
.env		.env
.gitignore		.gitignore
.python-version		.python-version
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
PERFORMANCE.md		PERFORMANCE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dream Go - All day, every day

Dependencies

Dev Dependencies

Training

Reinforcement Learning

AlphaZero

Expert Iteration

Roadmap

License

About

Releases 6

Packages

Contributors 3

Languages

License

kblomdahl/dream-go

Folders and files

Latest commit

History

Repository files navigation

Dream Go - All day, every day

Dependencies

Dev Dependencies

Training

Reinforcement Learning

AlphaZero

Expert Iteration

Roadmap

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 3

Languages

Packages