Closed
Description
Why do we execute RLBase.plan! after the experiment is done? Also we should test if the environment is terminated?
For environments with FULL_ACTION_SET Action_style, this yields an error as the legal_action_mask is empty. Or the policies should be updated s.t. they do not return anything when environment is terminated for plan!? BUT this behavior would also be unexpected.
src\ReinforcementLearningCore\src\policies\agent\multi_agent.jl:133:140
if check_stop(stop_condition, policy, env)
is_stop = true
push!(multiagent_policy, PreActStage(), env)
optimise!(multiagent_policy, PreActStage())
push!(multiagent_hook, PreActStage(), policy, env)
RLBase.plan!(multiagent_policy, env) # let the policy see the last observation
break
end
In my opinion, this should be completely omitted: Only do
is_stop = true
break
Metadata
Metadata
Assignees
Labels
No labels