Closed
Description
Is it possible to have an alternative way of handling max_steps in continuing environments? As of now the terminal field is set to 'true' when the environment reaches the max_steps even though it's state is random and not a terminal one. In practice it probably doesn't matter much but as it is now the update of the value function is incorrect when reaching the max_steps, I think. Agree?