Skip to content

Commit

Permalink
feature(ekiefl): add pooltool env and related configs (#227)
Browse files Browse the repository at this point in the history
* Add SumToThree pooltool env

* Woops

* Update datatypes and add single inference mode

* Move core into pooltool

* Add some speed and memory profiling for env debug

* Trying to get CNNs working

* Patch #172

* Setup first experiment

* Fix up sumtothreeimage

* Update obs space to be float

* Move image_representation into fork

- It was in pooltool ai-framework branch
- By moving it here, main branch of pooltool can be used

* Start a README

* Begin test suite for sum_to_three_env

* Add tests for datatypes

* Finish test suite for sum_to_three_env

* rename tests -> characterize

* Delete

* Increase to 300,000 replay buffer

* Finish README

* Fix image link

* Link the discussion page

* Update pooltool API calls to 0.3.0

* Switch to dataclasses

- attrs is not standard library, best not to impose my standards
- Also had some docs

* Progress on documentation and variable naming

* Finish docs for datatypes.py

* Data structure changes

- Additionally, move reward function into reward module and add options
  to select different rewards via cfg

* Parameterize action space bounds

- Remove clunky class methods

* Add a module docstring

* Finish docstrings for sum_to_three coordinate environment

* rm pooltool __init__.py

- LSP was getting confused with the `import pooltool` statement

* Add pytest

* Add pooltool-billiards

* Add docs for reward space

* Add tests for grayscale conversion, add docs

* Add module doc for reward.py

* Add docs for image_representation

* Fix image env

* Update info about px parameter

* Add serialie/deserialize methods for RenderConfig

* Three things:

- move px to RenderConfig
- serialize/deserialization methods for RenderConfig
- Mimic the refactor in cts env to the image env

* Use channels in renderconfig

* Buff image_representation visualization

- Add an animation

* Start consolidation

* More consolidation between observation types

* consolidate image and coordinate observation types

* Remove old file

* Add default config

* Single source state setting

* Add tests

* Unused

* Add default render config option

- Store as attribute

* Add speed test script

* Small changes

* Add sum to three to feature table

* Update pooltool README

* Move observation/ and reward.py into utils.py

* polish(pu): polish sum_to_three configs

* feature(pu): add sum_to_three_vector_obs_sac_config.py and polish related config names

* polish(pu): polish sum_to_three configs

* polish(pu): polish pooltool configs

---------

Co-authored-by: dyyoungg <yangdeyu@sensetime.com>
Co-authored-by: 蒲源 <2402552459@qq.com>
Co-authored-by: 蒲源 <48008469+puyuan1996@users.noreply.github.com>
  • Loading branch information
4 people authored Jul 4, 2024
1 parent 540bdcb commit 39dfa3c
Show file tree
Hide file tree
Showing 28 changed files with 2,375 additions and 4 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -1445,4 +1445,7 @@ events.*
**/tb/*
**/mcts/ctree/tests_cpp/*
**/*tmp*
lzero/mcts/ctree/ctree_alphazero/pybind11

# pooltool-specific stuff
!/assets/pooltool/**
lzero/mcts/ctree/ctree_alphazero/pybind11
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ The environments and algorithms currently supported by LightZero are shown in th
| MiniGrid | --- |||| 🔒 | 🔒 ||🔒 |
| Bsuite | --- |||| 🔒 | 🔒 ||🔒 |
| Memory | --- |||| 🔒 | 🔒 ||🔒 |
| SumToThree (billiards) | --- | 🔒 | 🔒 || 🔒 | 🔒 |🔒|🔒 |


<sup>(1): "✔" means that the corresponding item is finished and well-tested.</sup>
Expand Down
1 change: 1 addition & 0 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ LightZero 目前支持的环境及算法如下表所示:
| MiniGrid | --- |||| 🔒 | 🔒 ||🔒 |
| Bsuite | --- |||| 🔒 | 🔒 ||🔒 |
| Memory | --- |||| 🔒 | 🔒 ||🔒 |
| SumToThree (billiards) | --- | 🔒 | 🔒 || 🔒 | 🔒 |🔒|🔒 |

<sup>(1): "✔" 表示对应的项目已经完成并经过良好的测试。</sup>

Expand Down
Binary file added assets/pooltool/3hits.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/pooltool/4hits.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/pooltool/discrete.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/pooltool/feature_planes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/pooltool/largecut.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/pooltool/nocut.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions lzero/model/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ def __init__(

self.sim_norm = SimNorm(simnorm_dim=group_size)


def forward(self, x: torch.Tensor) -> torch.Tensor:
"""
Shapes:
Expand Down
2 changes: 1 addition & 1 deletion lzero/policy/sampled_efficientzero.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,8 +248,8 @@ def _init_learn(self) -> None:
init_w = self._cfg.init_w
self._model.prediction_network.fc_policy_head.mu.weight.data.uniform_(-init_w, init_w)
self._model.prediction_network.fc_policy_head.mu.bias.data.uniform_(-init_w, init_w)
self._model.prediction_network.fc_policy_head.log_sigma_layer.weight.data.uniform_(-init_w, init_w)
try:
self._model.prediction_network.fc_policy_head.log_sigma_layer.weight.data.uniform_(-init_w, init_w)
self._model.prediction_network.fc_policy_head.log_sigma_layer.bias.data.uniform_(-init_w, init_w)
except Exception as exception:
logging.warning(exception)
Expand Down
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ bsuite
minigrid
moviepy
pycolab
line_profiler
pytest
pooltool-billiards>=0.3.1
line_profiler
1 change: 0 additions & 1 deletion zoo/atari/config/atari_muzero_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
batch_size = 256
max_env_step = int(5e5)
reanalyze_ratio = 0.
eps_greedy_exploration_in_collect = True

# =========== for debug ===========
# collector_env_num = 1
Expand Down
123 changes: 123 additions & 0 deletions zoo/pooltool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Billiards RL

Welcome to the documentation for billiards simulation within the LightZero framework. Billiards offers an intriguing learning environment for reinforcement learning due to its continuous action space, turn-based play, and the need for long-term planning and strategy formulation.

## Pooltool

Pooltool is a general purpose billiards simulator crafted specifically for science and engineering applications (learn more [here](https://github.com/ekiefl/pooltool)). It has been incorporated into LightZero to create diverse learning environments for billiards games.

## Testing your installation

Pooltool comes pre-installed with LightZero. If you are using a custom setup, follow the _pip_ install instructions [here](https://pooltool.readthedocs.io/en/latest/getting_started/install.html#install-option-1-pip).

Verify pooltool is found in your python path:

```bash
python -c "import pooltool; print(pooltool.__version__)"
```

Further test your installation by opening the interactive interface:

```bash
# Unix
run_pooltool

# Windows
run_pooltool.bat
```

(For instructions on how to play, check out the [Getting Started tutorial](https://pooltool.readthedocs.io/en/latest/getting_started/interface.html))

## Supported Games

Currently supports the following games:

1. **Sum to Three**: A simplified billiards game designed to make learning easier for agents.
2. **Standard Billiards Games** (planned for future updates): Including 8-ball, 9-ball, and snooker.

The rest of the document provides details for each supported game.

## Game 1: Sum to Three

Standard billiards games like 8-ball, 9-ball, and snooker have complex rulesets which make learning more difficult.

In contrast, _sum to three_ is a fictitious billiards game with a simple ruleset.

### Rules

1. The game is played on a table with no pockets
1. There are 2 balls: a cue ball and an object ball
1. The player must hit the object ball with the cue ball
1. The player scores a point if the number of times a ball hits a cushion is 3
1. The player takes 10 shots, and their final score is the number of points they achieve

For example, this is a successful shot because there are three ball-cushion collisions:

<img src="../../assets/pooltool/3hits.gif" width="600" />

This is an unsuccessful shot because there are four ball-cushion collisions:

<img src="../../assets/pooltool/4hits.gif" width="600" />

### Observation / Action Spaces

Continuous and discrete observatwon spaces are supported. The continuous observation space uses the coordinates of the two balls as the observation. The discrete observation space is based on configurable image-based feature planes.

In general, when an agent strikes a cue ball, the cue stick is described by 5 continuous parameters:

```
V0 : positive float
What initial velocity does the cue strike the ball?
phi : float (degrees)
The direction you strike the ball
theta : float (degrees)
How elevated is the cue from the playing surface, in degrees?
a : float
How much side english should be put on? -1 being rightmost side of ball, +1 being
leftmost side of ball
b : float
How much vertical english should be put on? -1 being bottom-most side of ball, +1 being
topmost side of ball
```

Since sum to three is a simple game, only a reduced action space with 2 parameters is supported:

1. V0: The speed of the cue stick. Increasing this means the cue ball travels further
1. cut angle: The angle that the cue ball hits the object ball with

For example, in this shot, the cut angle is -70 (hitting the left side of the object ball):

<img src="../../assets/pooltool/largecut.gif" width="600" />

For example, in this shot, the cut angle is 0 (head-on collision):

<img src="../../assets/pooltool/nocut.gif" width="600" />

Based on the game dimensions, a suitable bound for the action parameters is used: [0.3, 3] for speed and [-70, 70] for cut angle.

### Experiments

You can conduct experiments using different observation spaces:

1. **Continuous Observation Space Experiment**:
- Run the experiment with:
```bash
python ./zoo/pooltool/sum_to_three/config/sum_to_three_config.py
```
- Results will be saved in `./data_pooltool_sampled_efficientzero/image-obs`.

2. **Discrete Observation Space Experiment**:
- Run the experiment with:
```bash
python ./zoo/pooltool/sum_to_three/config/sum_to_three_image_config.py
```
- Modify the feature plane information by editing `./zoo/pooltool/sum_to_three/config/feature_plane_config.json`. View the usage example in `./zoo/pooltool/image_representation.py` for details about the feature plane content.
- Results will be saved in `./data_pooltool_sampled_efficientzero/vector-obs`.

### Results

TODO(puyuan1996)

## Game 2: 8-ball / 9-ball / 3-cushion / snooker

What billiards game would you like to see next?
Loading

0 comments on commit 39dfa3c

Please sign in to comment.