Description
hello! i'm currently working on implementing trpo (#134) from the master branch; however, i'm having trouble with the new pipeline.
right now, optimise!
is called after supplying the environment with an action. this means algorithms that optimise after each episode won't be able to use optimise!
; they need to implement (agent::Agent{PolicyType})(::PostEpisodeState, env)
instead, while leaving optimise!
as a no-op.
additionally, i'm not sure how to use RLTrajectories.jl to get an entire episode. also, can Agent push episode info to Trajectory correctly? i think push!(agent.trajectory, (agent.cache..., action=action))
would only work if agent.trajectory
is a Traces
, not Episodes
, right? in general, i don't know how to work with the for batch in trajectory
paradigm for an entire episode.
if vpg.jl
could be updated to the new pipeline it would be super helpful as an example!