Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impala CNN + ProcGen #298

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Impala CNN + ProcGen #298

wants to merge 9 commits into from

Conversation

burgalon
Copy link

@burgalon burgalon commented Oct 12, 2020

As someone starting on RL I thought it would be a nice exercise to add IMPALA CNN + ProcGen to SpinningUp.
I've copied the PPO folder and created PPO shared to allow training a shared policy + value function and also train on a GPU.

Example https://colab.research.google.com/drive/1c64C3DNSriYzVHc2ESEwB7Un81tY96Ln?usp=sharing

Currently it's hard for me to tell if the code is training at all.
Here are few of the places I think are buggy:

  1. The update() phase was stopped by KL divergence, which is currently commented.
  2. Maybe I'm missing something calculating the joint loss when doing (loss_v + loss_pi).backward()
  3. The mlp() example seems to be agnostic to how many dimensions are during the experience-collection phase, vs the update() phase, which I'm not sure I solved well using ac.step(torch.as_tensor(o[None], dtype=torch.float32, device=device))
  4. Is it reasonable to expect the agent to train without frame-stacking ? P.S: I'm wondering if there's a benchmark of frame-stacking vs recurrent model and whether there's a comparison
  5. Currently I do not apply an activation after the last layer of ImpalaCNN and wondering if I should add a Softmax? More generally, I'm not sure the code in ImpalaCNNActorCritic is right....
  6. I brewed some of my own logic for sampling mini-batches in order to fit into memory/GPU which I'm not 100% sure for i in tqdm(range(buf.ptr * train_iters // batch_size)):

Would love to get some pointers on how to approach this, or if somebody is also up to the exercise and feels like Zoom pair-programming?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant