Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO / TRPO fixes #7

Merged
merged 51 commits into from
Nov 1, 2023
Merged

PPO / TRPO fixes #7

merged 51 commits into from
Nov 1, 2023

Conversation

josiahls
Copy link
Owner

  • ppo Fixed:
  • actor model starts with bad std config. Changed to be simpler
  • we skip the last trajectory step since it is terminated.

josiahls and others added 30 commits February 27, 2023 02:21
- ppo
Fixed:
- actor model starts with bad std config. Changed to be simpler
- we skip the last trajectory step since it is terminated.
- dockerfile was failing
- settings crashing extras require
- deprecated jupyterlab in favor of vscode dev container
Changed:
- torch / torchdata deps to be newer
- blogging capabilities
Notes:
- trying to simplify the data pipeline. TBH not sure how I feel about "transforms"
when we can simply using the torchdata mapping
- a bunch fo stuff changed about the dataloading also, hopefully for the better.
I want to simplify the async execution also. Take another shot at using "shared_memory".
If that doesn't work, see if there is a way to do shared memory a different way.
- minor changes
- core modules to use torchdata 0.7.*
Added:
- more unified google colabcheck / import script
Removed:
- DataBlock concept. Just use pipelines or functions that return pipelines.
It's just much easier to follow and read this way.
- lots of fixes, and compat updates to the datapipe lines
- pickleable cache holder so that envs get reset when pickled
Removed:
- usages using blocks. They just seem to overly obsuficate the code
- everything up to the dqn itself. I'm able to train the dqn, but
my gpu keeps overheating i think
- image to 11.7.1
- learner pickling was not compatible with new torchdata
- logging I think can be simplified to not be so complex
- updates for logging. I'm rethinking about to setup the logging
pipeline to be less weird.
- record catcher and dumper. Maybe this will make the api less weird.
- logging to be simpler and flatter heriarchy wise. Hopefully this doesnt
blow up in my face
- the learner to be easier to augment the iteration, as well as move
100% of the epoching / batch tracking to the respective pipes because
obviously they should be primarily in charge of that
- dataloaders making function to cleanup dataloading init, and updated
this in the learners
- conda references. I'd rather use docker tbh.
Added:
- pretty cool visualizer for memory
Fixed:
- bunch of warnings
Notes:
- target dqn still doesnt converge fast. Sometimes the progress bar
doesnt udpate either. I think I need to use the logging library a little
heavier to figure out what is happening. From what I can tell, the dataloader
side works fine :/

Must be something wrong with the actual learning, need to figure out what.
- progress logger to use tqdm logger instead
- target learner now does validation correctly, and learner in general does it
better.
- agent execution when iterating through memory. Something is weird with
the action selection
- custom torchdata fork as a submodule
- requirements to handle local torchdata install
- docker yml to also clone submodules
josiahls and others added 21 commits October 9, 2023 17:24
- docker iamge again
- sudo password support
- github action to only build dev, and pull dev before attempting a build
- agent not being exhausted when new interations occur.
- AgentBase gets copied by dataloaders, which breaks the model<-> learner connection.
We were training a separate model from the one actually being used in the agent :/

DQN target converges very past
- all dqns converge as quickly as I would expect now!
- StepType to StepTypes and is now a registration object
- TRPO kind of converges but then fails. Might be converging too slow
- updated ppo. It learns mountaincar, but very slow.
Next steps:
- Separate the advantage buffer code to its own notebook with separate tests.
I need to know that it works, and has no bugs.
- separate ppo loss from critic loss. Right now I'm just logging the actor
@josiahls josiahls marked this pull request as ready for review November 1, 2023 15:02
@josiahls josiahls merged commit b66b8f6 into main Nov 1, 2023
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant