new _run()

hello!  i'm currently working on implementing trpo (#134) from the master branch; however, i'm having trouble with the new pipeline.

right now, `optimise!` is called after supplying the environment with an action.  this means algorithms that optimise after each episode won't be able to use `optimise!`; they need to implement `(agent::Agent{PolicyType})(::PostEpisodeState, env)` instead, while leaving `optimise!` as a no-op.

additionally, i'm not sure how to use RLTrajectories.jl to get an entire episode.  also, can Agent push episode info to Trajectory correctly?  i think `push!(agent.trajectory, (agent.cache..., action=action))` would only work if `agent.trajectory` is a `Traces`, not `Episodes`, right?  in general, i don't know how to work with the `for batch in trajectory` paradigm for an entire episode.

if `vpg.jl` could be updated to the new pipeline it would be super helpful as an example!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

new _run() #731

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

new _run() #731

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions