Executing RLBase.plan! after end of experiment

Why do we execute RLBase.plan! **after the experiment is done**? Also we should test if the environment is terminated? 

For environments with FULL_ACTION_SET Action_style, this yields **an error as the legal_action_mask is empty**. Or the policies should be updated s.t. they do not return anything when environment is terminated for plan!? BUT this behavior would also be unexpected.

src\ReinforcementLearningCore\src\policies\agent\multi_agent.jl:133:140
```
if check_stop(stop_condition, policy, env)
       is_stop = true
       push!(multiagent_policy, PreActStage(), env)
       optimise!(multiagent_policy, PreActStage())
       push!(multiagent_hook, PreActStage(), policy, env)
       RLBase.plan!(multiagent_policy, env)  # let the policy see the last observation
       break
end
```

In my opinion, this should be completely omitted: Only do 
```
is_stop = true
break
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Executing RLBase.plan! after end of experiment #913

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Executing RLBase.plan! after end of experiment #913

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions