Rework the run loop #921

HenriDeh · 2023-07-07T09:18:40Z

This PR attempts to rework the run loop in order to fix data storage misalignment. This relies on the RLTrajectories 0.2 release that stores the data along with meta information about each step and keeps the traces aligned.

The loop is now as follows

PreEpisodeStage: the state is pushed alone in the trajectory.
PreActStage: nothing
plan! simply query and the action for the current state
act!
PostActStage: push the action, reward, terminal flag, and the new state to the trajectory.
check_stop: now only breaks the episode loop prematurely.
loop to PreActStage unless reset condition
PostEpisodeStage, checks for missing traces in the trajectory, typically :next_action. Only query for an action if :next_action is a name in the trajectory's keys. Therefore the user must make sure to use the correct trajectory.

You'll notice that agent no longer has a cache. This is because reward, terminal and next_state can all be obtained at the PostActStage and immediately pushed to the trajectory. The action is given as an argument to push.

I also renamed agent/base.jl to agent/agent_base.jl because it was bugging me to have a base.jl tab that was unrelated to RLBase.

PR Checklist

codecov · 2023-07-07T09:20:31Z

Codecov Report

Merging #921 (aa55642) into main (2752420) will decrease coverage by 0.72%.
The diff coverage is 78.43%.

@@            Coverage Diff             @@
##             main     #921      +/-   ##
==========================================
- Coverage   24.29%   23.57%   -0.72%     
==========================================
  Files         222      222              
  Lines        7747     7699      -48     
==========================================
- Hits         1882     1815      -67     
- Misses       5865     5884      +19

Files Changed	Coverage Δ
...ementLearningCore/src/ReinforcementLearningCore.jl	`100.00% <ø> (ø)`
...LearningCore/src/policies/agent/agent_srt_cache.jl	`0.00% <0.00%> (-100.00%)`	⬇️
...forcementLearningCore/test/policies/multi_agent.jl	`100.00% <ø> (ø)`
...eps/experiments/experiments/DQN/DQN_CartPoleGPU.jl	`0.00% <ø> (ø)`
...ments/experiments/DQN/JuliaRL_BasicDQN_CartPole.jl	`0.00% <ø> (ø)`
...xperiments/experiments/DQN/JuliaRL_DQN_CartPole.jl	`0.00% <ø> (ø)`
...xperiments/experiments/DQN/JuliaRL_IQN_CartPole.jl	`0.00% <ø> (ø)`
...xperiments/experiments/DQN/JuliaRL_NFQ_CartPole.jl	`0.00% <ø> (ø)`
...experiments/DQN/JuliaRL_PrioritizedDQN_CartPole.jl	`0.00% <ø> (ø)`
...eriments/experiments/DQN/JuliaRL_QRDQN_CartPole.jl	`0.00% <ø> (ø)`
... and 10 more

... and 1 file with indirect coverage changes

HenriDeh · 2023-07-07T09:22:01Z

@Mytolo have a look. It's WIP but it gives an idea of what this could be. Feel free to make reviews.

Mytolo

For the multi_agent_policy, the run loop has also to be adjusted (In sequential environment dispatch) I am also not sure if the way the experience is stored that way it's like i suppose it to be and would like to discuss it here:

Tuples in experience replay:

(s_0, a_0, r_1, s_1, d_1)
(s_1, a_1, r_2, s_2, d_2)
...
(s_t, a_t, r_t+1, s_t+1, d_t+1)

and so on. Is it like that? And for trajectories with next_action, there should also be a_t+1 which is different to that in the next state choosen one? Or is it the same? (Or does it even matter if it is or not?)

src/ReinforcementLearningCore/src/policies/agent/agent_base.jl

Mytolo · 2023-07-07T09:49:54Z

src/ReinforcementLearningCore/src/policies/agent/agent_base.jl

-    push!(agent.cache, reward(env), is_terminated(env))
+function Base.push!(agent::Agent, ::PostActStage, env::AbstractEnv, action)
+    next_state = state(env)
+    push!(agent.trajectory, (state = next_state, action = action, reward = reward(env), terminal = is_terminated(env)))


Maybe I got sth wrong, but should next_state not stored in next_state field of the trajectory? next_state is successor of the state before the action was done in the environment, right?

It's the same. Both names point to the same Trace in the trajectory.

Right. It is the multiplex trace, right?

Mytolo · 2023-07-07T09:52:27Z

src/ReinforcementLearningCore/src/policies/agent/agent_base.jl

-function Base.push!(agent::Agent, ::PostExperimentStage, env::E, player::Symbol) where {E<:AbstractEnv}
-    RLBase.reset!(agent.cache)
+function Base.push!(agent::Agent, ::PostEpisodeStage, env::E)
+    if haskey(agent.trajectory, :next_action) 


This is if the episode finished (whether truncated or terminated) we query the policy to plan another step. We should also check if the environment is not terminated? If it is, it just makes no sense to plan an action.

It wouldn't make sense indeed, but if your environment has terminal states at all, then you should not use a trajectory that has a next_action key. That's the how I thought about it. If we add that check, then it allows the user to have an incorrect trajectory without an error being thrown and the buffer will accumulate mistakes.

src/ReinforcementLearningCore/src/policies/agent/agent_base.jl

HenriDeh · 2023-07-07T10:30:49Z

(s_0, a_0, r_1, s_1, d_1)
(s_1, a_1, r_2, s_2, d_2)
...
(s_t, a_t, r_t+1, s_t+1, d_t+1)

Yes these are the tuples that indexing/sampling the buffer would return. They are not stored as tuples though, each trace is contiguously stored in a dedicated array. state and next_state share that array.

HenriDeh · 2023-07-07T10:31:20Z

For the multi_agent_policy, the run loop has also to be adjusted (In sequential environment dispatch)

Haven't looked at that yet but I should not forget thank you.

docs/src/How_to_implement_a_new_algorithm.md

docs/src/Zoo_Algorithms/MPO.md

src/ReinforcementLearningCore/src/core/run.jl

jeremiahpslewis · 2023-07-11T13:56:19Z

src/ReinforcementLearningCore/src/core/stages.jl

 Base.push!(p::AbstractPolicy, ::AbstractStage, ::AbstractEnv, ::Symbol) = nothing
+Base.push!(p::AbstractPolicy, ::PostActStage, ::AbstractEnv, action, ::Symbol) = nothing


This is begging for us to create an action type, but that's something for another PR. :)

jeremiahpslewis · 2023-07-11T13:59:32Z

@HenriDeh This looks great! I'll run a couple of type stability checks after this gets closer to ✅

…ning/ReinforcementLearning.jl into loop-traj

src/ReinforcementLearningCore/Project.toml

HenriDeh added 8 commits July 6, 2023 13:31

bump version compat

a80db65

bump version

d5209fe

simplify run loop and compat with traj 0.2

a4e24ce

rename to agent base

597092b

optional push at end of episode

d2a2aaa

use new SARST name

629db63

bump compats

1deca16

update MA plan

815e363

HenriDeh linked an issue Jul 7, 2023 that may be closed by this pull request

Executing RLBase.plan! after end of experiment #913

Closed

fix typing

f930627

Mytolo suggested changes Jul 7, 2023

View reviewed changes

Mytolo reviewed Jul 7, 2023

View reviewed changes

src/ReinforcementLearningCore/src/policies/agent/agent_base.jl Outdated Show resolved Hide resolved

HenriDeh added 3 commits July 7, 2023 13:30

agent typing

6305bb0

fix precompile

b2c9d1b

fix first tests

522da77

jeremiahpslewis reviewed Jul 11, 2023

View reviewed changes

docs/src/How_to_implement_a_new_algorithm.md Outdated Show resolved Hide resolved

Update docs/src/How_to_implement_a_new_algorithm.md

1ce5f53

jeremiahpslewis reviewed Jul 11, 2023

View reviewed changes

docs/src/Zoo_Algorithms/MPO.md Outdated Show resolved Hide resolved

Update docs/src/Zoo_Algorithms/MPO.md

f668bf6

jeremiahpslewis reviewed Jul 11, 2023

View reviewed changes

src/ReinforcementLearningCore/src/core/run.jl Outdated Show resolved Hide resolved

jeremiahpslewis reviewed Jul 11, 2023

View reviewed changes

HenriDeh added 2 commits July 11, 2023 17:01

deactivate VPG and TRPO

e695cad

Merge branch 'loop-traj' of https://github.com/JuliaReinforcementLear…

218e86b

…ning/ReinforcementLearning.jl into loop-traj

HenriDeh mentioned this pull request Jul 11, 2023

CompatHelper: bump compat for ReinforcementLearningTrajectories to 0.2 for package ReinforcementLearningCore, (keep existing compat) #924

Closed

export sample

4198403

HenriDeh added 7 commits July 12, 2023 13:19

fix NFQ

e8ed7b6

remove comments

605bcc9

change the MA loop

48c5173

simultaneous agents

8e77654

Move MA stuff to proper file

7c25e95

Merge branch 'loop-traj' of https://github.com/JuliaReinforcementLear…

a2948fa

…ning/ReinforcementLearning.jl into loop-traj

fix ambiguity

875da76

Mytolo mentioned this pull request Jul 19, 2023

refactored multi agent proposal #926

Open

jeremiahpslewis mentioned this pull request Jul 24, 2023

Fix interaction between CircularPrioritizedTraces and NStepBatchSampler JuliaReinforcementLearning/ReinforcementLearningTrajectories.jl#44

Merged

jeremiahpslewis added 11 commits July 25, 2023 17:07

Bump RLTraj to bug fix version

36d13f9

Merge branch 'main' into loop-traj

6460cd9

Fix type name

21ca014

Merge branch 'main' into loop-traj

ea877fc

Drop player to clean up dispatch

b938655

Add back player

e710880

Fix state push!

6ee0306

Update multi_agent.jl

45df594

Broaden type signature

7b4f964

type signature tweak

3d95881

type tweak

d696d78

jeremiahpslewis reviewed Jul 29, 2023

View reviewed changes

src/ReinforcementLearningCore/Project.toml Outdated Show resolved Hide resolved

jeremiahpslewis added 2 commits July 29, 2023 12:41

Update src/ReinforcementLearningCore/Project.toml

c7d856d

Minor tweaks

ff4b98f

jeremiahpslewis marked this pull request as ready for review July 29, 2023 15:48

jeremiahpslewis added 2 commits July 29, 2023 18:08

Require RLTraj bug fix

550222b

Require bug-fixed RLTrajectories

aa55642

jeremiahpslewis enabled auto-merge (squash) July 29, 2023 20:06

jeremiahpslewis disabled auto-merge July 29, 2023 21:08

jeremiahpslewis merged commit cce387b into main Jul 29, 2023

HenriDeh deleted the loop-traj branch August 7, 2023 09:12

		Base.push!(p::AbstractPolicy, ::AbstractStage, ::AbstractEnv, ::Symbol) = nothing
		Base.push!(p::AbstractPolicy, ::PostActStage, ::AbstractEnv, action, ::Symbol) = nothing

Uh oh!

Rework the run loop #921

Rework the run loop #921

Uh oh!

Conversation

HenriDeh commented Jul 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

Mytolo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mytolo Jul 7, 2023

Choose a reason for hiding this comment

Uh oh!

HenriDeh Jul 7, 2023

Choose a reason for hiding this comment

Uh oh!

Mytolo Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

HenriDeh Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

Mytolo Jul 7, 2023

Choose a reason for hiding this comment

Uh oh!

HenriDeh Jul 7, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiahpslewis Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

jeremiahpslewis commented Jul 11, 2023

Uh oh!

Uh oh!

Uh oh!

HenriDeh commented Jul 7, 2023 •

edited

Loading

codecov bot commented Jul 7, 2023 •

edited

Loading