Skip to content

Commit

Permalink
Add link to Rich Sutton's website for the quote and use quote formatt…
Browse files Browse the repository at this point in the history
…ing to clarify

visually that it's a quote. Remove italics of the whole text so that it can be used
for the words "search" and "learning" that are emphasized in the original.

Use relative paths to code in the same repo. Also, minor formatting & grammar fixes.

PiperOrigin-RevId: 495103568
  • Loading branch information
mbrukman authored and MctxDev committed Dec 13, 2022
1 parent 66a8d90 commit 6b827d2
Showing 1 changed file with 15 additions and 14 deletions.
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,20 @@ pip install git+https://github.com/deepmind/mctx.git
## Motivation

Learning and search have been important topics since the early days of AI
research. In the words of Rich Sutton: *One thing that should be learned [...]
is the great power of general purpose methods, of methods that continue to scale
with increased computation even as the available computation becomes very great.
The two methods that seem to scale arbitrarily in this way are search and
learning.*
research. In the [words of Rich Sutton](http://www.incompleteideas.net/IncIdeas/BitterLesson.html):

> One thing that should be learned [...] is the great power of general purpose
> methods, of methods that continue to scale with increased computation even as
> the available computation becomes very great. The two methods that seem to
> scale arbitrarily in this way are *search* and *learning*.
Recently, search algorithms have been successfully combined with learned models
parameterized by deep neural networks, resulting in some of the most powerful
and general reinforcement learning algorithms to date (e.g. MuZero).
However, using search algorithms in combination with deep neural networks
requires efficient implementations, typically written in fast compiled
languages; this can come at the expense of usability and hackability,
especially for researchers that are not familiar with C++. In turn this limits
especially for researchers that are not familiar with C++. In turn, this limits
adoption and further research on this critical topic.

Through this library, we hope to help researchers everywhere to contribute to
Expand Down Expand Up @@ -92,8 +93,8 @@ new_embedding)` with a `RecurrentFnOutput` and the embedding of the next state.
The `RecurrentFnOutput` contains the `reward` and `discount` for the transition,
and `prior_logits` and `value` for the new state.

In [examples/visualization_demo.py](https://github.com/deepmind/mctx/blob/main/examples/visualization_demo.py)
you can see calls to a policy:
In [`examples/visualization_demo.py`](https://github.com/deepmind/mctx/blob/main/examples/visualization_demo.py), you can
see calls to a policy:

```python
policy_output = mctx.gumbel_muzero_policy(params, rng_key, root, recurrent_fn,
Expand All @@ -109,22 +110,22 @@ We recommend to use the `gumbel_muzero_policy`.
[Gumbel MuZero](https://openreview.net/forum?id=bERaNdoegnO) guarantees a policy
improvement if the action values are correctly evaluated. The policy improvement
is demonstrated in
[examples/policy_improvement_demo.py](https://github.com/deepmind/mctx/blob/main/examples/policy_improvement_demo.py).
[`examples/policy_improvement_demo.py`](https://github.com/deepmind/mctx/blob/main/examples/policy_improvement_demo.py).

### Example projects
The following projects demonstrate the Mctx usage:

- [Basic Learning Demo with Mctx](https://github.com/kenjyoung/mctx_learning_demo)
... AlphaZero on random mazes.
- [a0-jax](https://github.com/NTT123/a0-jax) ... AlphaZero on Connect Four,
Gomoku, and Go.
- [Basic Learning Demo with Mctx](https://github.com/kenjyoung/mctx_learning_demo)
AlphaZero on random mazes.
- [a0-jax](https://github.com/NTT123/a0-jax) AlphaZero on Connect Four,
Gomoku, and Go.

Tell us about your project.

## Citing Mctx

This is not an officially supported Google product. Mctx is part of the
[DeepMind JAX Ecosystem], to cite Mctx please use the [DeepMind JAX Ecosystem
[DeepMind JAX Ecosystem]; to cite Mctx, please use the [DeepMind JAX Ecosystem
citation].

[DeepMind JAX Ecosystem]: https://deepmind.com/blog/article/using-jax-to-accelerate-our-research "DeepMind JAX Ecosystem"
Expand Down

0 comments on commit 6b827d2

Please sign in to comment.