Update

Vincent Moens · Vincent Moens · commit 570afff294e1 · 2024-11-08T14:20:20.000Z
[ghstack-poisoned]
diff --git a/docs/source/_static/img/mcts_forest.png b/docs/source/_static/img/mcts_forest.png
diff --git a/docs/source/reference/data.rst b/docs/source/reference/data.rst
@@ -975,7 +975,72 @@ The following classes are deprecated and just point to the classes above:
 Trees and Forests
 -----------------
 
-TorchRL offers a set of classes and functions that can be used to represent trees and forests efficiently.
+TorchRL offers a set of classes and functions that can be used to represent trees and forests efficiently,
+which is particularly useful for Monte Carlo Tree Search (MCTS) algorithms.
+
+TensorDictMap
+~~~~~~~~~~~~~
+
+At its core, the MCTS API relies on the :class:`~torchrl.data.TensorDictMap` which acts like a storage where indices can
+be any numerical object. In traditional storages (e.g., :class:`~torchrl.data.TensorStorage`), only integer indices
+are allowed:
+
+    >>> storage = TensorStorage(...)
+    >>> data = storage[3]
+
+:class:`~torchrl.data.TensorDictMap` allows us to make more advanced queries in the storage. The typical example is
+when we have a storage containing a set of MDPs and we want to rebuild a trajectory given its initial observation, action
+pair. In tensor terms, this could be written with the following pseudocode:
+
+    >>> next_state = storage[observation, action]
+
+(if there is more than one next state associated with this pair one could return a stack of ``next_states`` instead).
+This API would make sense but it would be restrictive: allowing observations or actions that are composed of
+multiple tensors may be hard to implement. Instead, we provide a tensordict containing these values and let the storage
+know what ``in_keys`` to look at to query the next state:
+
+    >>> td = TensorDict(observation=observation, action=action)
+    >>> next_td = storage[td]
+
+Of course, this class also allows us to extend the storage with new data:
+
+    >>> storage[td] = next_state
+
+This comes in handy because it allows us to represent complex rollout structures where different actions are undertaken
+at a given node (ie, for a given observation). All `(observation, action)` pairs that have been observed may lead us to
+a (set of) rollout that we can use further.
+
+MCTSForest
+~~~~~~~~~~
+
+Building a tree from an initial observation then becomes just a matter of organizing data efficiently.
+The :class:`~torchrl.data.MCTSForest` has at its core two storages: a first storage links observations to hashes and
+indices of actions encountered in the past in the dataset:
+
+    >>> data = TensorDict(observation=observation)
+    >>> metadata = forest.node_map[data]
+    >>> index = metadata["_index"]
+
+where ``forest`` is a :class:`~torchrl.data.MCTSForest` instance.
+Then, a second storage keeps track of the actions and results associated with the observation:
+
+    >>> next_data = forest.data_map[index]
+
+The ``next_data`` entry can have any shape, but it will usually match the shape of ``index`` (since at each index
+corresponds one action). Once ``next_data`` is obtrained, it can be put together with ``data`` to form a set of nodes,
+and the tree can be expanded for each of these. The following figure shows how this is done.
+
+.. figure:: /_static/img/collector-copy.png
+
+    Building a :class:`~torchrl.data.Tree` from a :class:`~torchrl.data.MCTSForest` object.
+    The flowchart represents a tree being built from an initial observation `o`. The :class:`~torchrl.data.MCTSForest.get_tree`
+    method passed the input data structure (the root node) to the ``node_map`` :class:`~torchrl.data.TensorDictMap` instance
+    that returns a set of hashes and indices. These indices are then used to query the corresponding tuples of
+    actions, next observations, rewards etc. that are associated with the root node.
+    A vertex is created from each of them (possibly with a longer rollout when a compact representation is asked).
+    The stack of vertices is then used to build up the tree further, and these vertices are stacked together and make
+    up the branches of the tree at the root. This process is repeated for a given depth or until the tree cannot be
+    expanded anymore.
 
 .. currentmodule:: torchrl.data
 
diff --git a/test/test_storage_map.py b/test/test_storage_map.py
@@ -372,15 +372,16 @@ def make_labels(tree):
                     tree.rollout["next", "observation"],
                 ]
             )
+            a = tree.rollout["action"].tolist()
             s = s.tolist()
-            return f"{tree.node_id}: {s}"
-        return f"{tree.node_id}"
+            return f"node {tree.node_id}: states {s}, actions {a}"
+        return f"node {tree.node_id}"
 
     def test_forest_build(self):
         r0, *_ = self.dummy_rollouts()
         forest = self._make_forest()
         tree = forest.get_tree(r0[0])
-        # tree.plot(make_labels=self.make_labels)
+        tree.plot(make_labels=self.make_labels)
 
     def test_forest_vertices(self):
         r0, *_ = self.dummy_rollouts()
diff --git a/test/test_transforms.py b/test/test_transforms.py
@@ -7098,7 +7098,7 @@ def test_tensordictprimer_batching(self, batched_class, break_when_any_done):
         torch.manual_seed(0)
         env.set_seed(0)
         r1 = env.rollout(100, break_when_any_done=break_when_any_done)
-        tensordict.tensordict.assert_allclose_td(r0, r1)
+        tensordict.assert_close(r0, r1)
 
     def test_callable_default_value(self):
         def create_tensor():
diff --git a/torchrl/data/map/utils.py b/torchrl/data/map/utils.py
@@ -53,24 +53,25 @@ def make_labels(tree):
             x=Xe,
             y=Ye,
             mode="lines",
-            line={"color": "rgb(210,210,210)", "width": 1},
+            line={"color": "rgb(210,210,210)", "width": 5},
             hoverinfo="none",
         )
     )
     fig.add_trace(
         go.Scatter(
             x=Xn,
             y=Yn,
-            mode="markers",
+            mode="markers+text",
             name="bla",
             marker={
                 "symbol": "circle-dot",
-                "size": 18,
+                "size": 40,
                 "color": "#6175c1",  # '#DB4551',
                 "line": {"color": "rgb(50,50,50)", "width": 1},
             },
             text=labels,
             hoverinfo="text",
+            textposition="middle right",
             opacity=0.8,
         )
     )