PPO / TRPO fixes #7

josiahls · 2023-02-27T02:23:17Z

ppo Fixed:
actor model starts with bad std config. Changed to be simpler
we skip the last trajectory step since it is terminated.

- ppo Fixed: - actor model starts with bad std config. Changed to be simpler - we skip the last trajectory step since it is terminated.

- dockerfile was failing

- settings crashing extras require

- deprecated jupyterlab in favor of vscode dev container Changed: - torch / torchdata deps to be newer

- blogging capabilities Notes: - trying to simplify the data pipeline. TBH not sure how I feel about "transforms" when we can simply using the torchdata mapping - a bunch fo stuff changed about the dataloading also, hopefully for the better. I want to simplify the async execution also. Take another shot at using "shared_memory". If that doesn't work, see if there is a way to do shared memory a different way.

- minor changes

- core modules to use torchdata 0.7.* Added: - more unified google colabcheck / import script Removed: - DataBlock concept. Just use pipelines or functions that return pipelines. It's just much easier to follow and read this way.

- lots of fixes, and compat updates to the datapipe lines - pickleable cache holder so that envs get reset when pickled Removed: - usages using blocks. They just seem to overly obsuficate the code

- everything up to the dqn itself. I'm able to train the dqn, but my gpu keeps overheating i think

- image to 11.7.1

- learner pickling was not compatible with new torchdata

… feature/ppo

- logging I think can be simplified to not be so complex

- updates for logging. I'm rethinking about to setup the logging pipeline to be less weird.

- record catcher and dumper. Maybe this will make the api less weird.

- logging to be simpler and flatter heriarchy wise. Hopefully this doesnt blow up in my face

- the learner to be easier to augment the iteration, as well as move 100% of the epoching / batch tracking to the respective pipes because obviously they should be primarily in charge of that

- dataloaders making function to cleanup dataloading init, and updated this in the learners

- conda references. I'd rather use docker tbh. Added: - pretty cool visualizer for memory Fixed: - bunch of warnings Notes: - target dqn still doesnt converge fast. Sometimes the progress bar doesnt udpate either. I think I need to use the logging library a little heavier to figure out what is happening. From what I can tell, the dataloader side works fine :/ Must be something wrong with the actual learning, need to figure out what.

- progress logger to use tqdm logger instead

- target learner now does validation correctly, and learner in general does it better.

- agent execution when iterating through memory. Something is weird with the action selection

- custom torchdata fork as a submodule

- requirements to handle local torchdata install

- docker yml to also clone submodules

- docker iamge again

- sudo password support

- github action to only build dev, and pull dev before attempting a build

- agent not being exhausted when new interations occur.

… feature/ppo

- AgentBase gets copied by dataloaders, which breaks the model<-> learner connection. We were training a separate model from the one actually being used in the agent :/ DQN target converges very past

- all dqns converge as quickly as I would expect now!

- StepType to StepTypes and is now a registration object - TRPO kind of converges but then fails. Might be converging too slow

- updated ppo. It learns mountaincar, but very slow. Next steps: - Separate the advantage buffer code to its own notebook with separate tests. I need to know that it works, and has no bugs. - separate ppo loss from critic loss. Right now I'm just logging the actor

josiahls and others added 30 commits February 27, 2023 02:21

Added:

ff2e4d8

- ppo Fixed: - actor model starts with bad std config. Changed to be simpler - we skip the last trajectory step since it is terminated.

Notes:

4020163

- dockerfile was failing

Fixed:

b0ce41b

- settings crashing extras require

Notes:

26a8abc

- deprecated jupyterlab in favor of vscode dev container Changed: - torch / torchdata deps to be newer

Added:

e69e0f0

- minor changes

Updated:

a2a8004

- core modules to use torchdata 0.7.* Added: - more unified google colabcheck / import script Removed: - DataBlock concept. Just use pipelines or functions that return pipelines. It's just much easier to follow and read this way.

Lots of fixes

1c9b742

Added:

ec1cd4a

- lots of fixes, and compat updates to the datapipe lines - pickleable cache holder so that envs get reset when pickled Removed: - usages using blocks. They just seem to overly obsuficate the code

Fixed:

fbdd255

- everything up to the dqn itself. I'm able to train the dqn, but my gpu keeps overheating i think

Updated:

806d106

- image to 11.7.1

Update fastrl-docker.yml

c7148d6

Updated:

47e12d3

- learner pickling was not compatible with new torchdata

Merge branch 'feature/ppo' of https://github.com/josiahls/fastrl into…

592f466

… feature/ppo

Changed:

03bd64d

- logging I think can be simplified to not be so complex

Added:

cba3bcd

- updates for logging. I'm rethinking about to setup the logging pipeline to be less weird.

Added:

11627f4

- record catcher and dumper. Maybe this will make the api less weird.

Changed:

66b7b2e

- logging to be simpler and flatter heriarchy wise. Hopefully this doesnt blow up in my face

Updated:

57cc753

- the learner to be easier to augment the iteration, as well as move 100% of the epoching / batch tracking to the respective pipes because obviously they should be primarily in charge of that

Added:

c8ffb18

- dataloaders making function to cleanup dataloading init, and updated this in the learners

Updated:

b62ed82

- progress logger to use tqdm logger instead

Changed:

c65e8b9

- target learner now does validation correctly, and learner in general does it better.

Added:

924a3c9

- agent execution when iterating through memory. Something is weird with the action selection

Clear all notebooks since their outputs can be generated in docs

5dbe270

Minor changes. Trying to support more flexible agents

5b91f2d

Added:

57afe27

- custom torchdata fork as a submodule

Updated:

5ce88a8

- requirements to handle local torchdata install

remove data for now

7da9d00

Updated:

cd23d88

- docker yml to also clone submodules

josiahls and others added 21 commits October 9, 2023 17:24

Updated:

06cc6f4

- docker iamge again

Updated:

bcbb4aa

- sudo password support

minor changes

106b38c

minor change

903aade

Updated:

edfe022

- github action to only build dev, and pull dev before attempting a build

updated dockerfile

35962e3

pulling the image before building runs out of space on the device

29f613e

minor changes

7a87793

minor change

93dccd3

Update fastrl.Dockerfile

5cb2f0f

minor change

ee08ff9

Update fastrl.Dockerfile

9ec7d03

Update fastrl.Dockerfile

d845419

Update fastrl.Dockerfile

3d401e8

Fixed:

07054a3

- agent not being exhausted when new interations occur.

Merge branch 'feature/ppo' of https://github.com/josiahls/fastrl into…

ff70b15

… feature/ppo

Fixed:

4c0b6d3

- AgentBase gets copied by dataloaders, which breaks the model<-> learner connection. We were training a separate model from the one actually being used in the agent :/ DQN target converges very past

Fixed:

7273284

- all dqns converge as quickly as I would expect now!

Changed:

e2fab85

- StepType to StepTypes and is now a registration object - TRPO kind of converges but then fails. Might be converging too slow

TRPO converges on mountain car!

6b1be2c

Notes:

94d9bb0

- updated ppo. It learns mountaincar, but very slow. Next steps: - Separate the advantage buffer code to its own notebook with separate tests. I need to know that it works, and has no bugs. - separate ppo loss from critic loss. Right now I'm just logging the actor

josiahls marked this pull request as ready for review November 1, 2023 15:02

josiahls merged commit b66b8f6 into main Nov 1, 2023
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO / TRPO fixes #7

PPO / TRPO fixes #7

josiahls commented Feb 27, 2023

PPO / TRPO fixes #7

PPO / TRPO fixes #7

Conversation

josiahls commented Feb 27, 2023