Description
Motivation
gym's environments can pass some more info through a dictionary info
:
obs, reward, done, terminated = env.step(action)
.
This dictionary can contain additional information about the state of the env: time spent doing the task, whether the task is solved etc.
We wrap the obs, reward and done in the output TensorDict
. We can also place the info if they're numerical values (see #234):
>>> from torchrl.envs.libs.gym import GymWrapper
>>> from torchrl.envs import default_info_dict_reader
>>> reader = default_info_dict_reader(["my_info_key"])
>>> # assuming "some_env-v0" returns a dict with a key "my_info_key"
>>> env = GymWrapper(gym.make("some_env-v0"))
>>> env.set_info_dict_reader(info_dict_reader=reader)
>>> tensordict = env.reset()
>>> tensordict = env.rand_step(tensordict)
>>> assert "my_info_key" in tensordict.keys()
Problem is that we also register what are the expected observations (domain, dtype, device etc) in an observation_spec
attribute. It is important for us to know what to expect as output from the env. However the info is not yet registered in observation_spec
. This is a problem for parallel environments as we would wish to pre-allocate the tensors that we place in share memory to pass information from one process to another: for that, we use the observation_spec
which is cheaper than resetting the env.
Solution
We should be able to provide the specs corresponding to the info keys in the default_info_dict_reader
, and the set_info_dict_reader
should be able to read them. If no spec is provided, a default unidimensional, floating-point unbounded spec should be assumed.