Skip to content

new _run() #731

Closed
Closed
@baedan

Description

@baedan

hello! i'm currently working on implementing trpo (#134) from the master branch; however, i'm having trouble with the new pipeline.

right now, optimise! is called after supplying the environment with an action. this means algorithms that optimise after each episode won't be able to use optimise!; they need to implement (agent::Agent{PolicyType})(::PostEpisodeState, env) instead, while leaving optimise! as a no-op.

additionally, i'm not sure how to use RLTrajectories.jl to get an entire episode. also, can Agent push episode info to Trajectory correctly? i think push!(agent.trajectory, (agent.cache..., action=action)) would only work if agent.trajectory is a Traces, not Episodes, right? in general, i don't know how to work with the for batch in trajectory paradigm for an entire episode.

if vpg.jl could be updated to the new pipeline it would be super helpful as an example!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions