Impala CNN + ProcGen #298

burgalon · 2020-10-12T16:33:51Z

As someone starting on RL I thought it would be a nice exercise to add IMPALA CNN + ProcGen to SpinningUp.
I've copied the PPO folder and created PPO shared to allow training a shared policy + value function and also train on a GPU.

Example https://colab.research.google.com/drive/1c64C3DNSriYzVHc2ESEwB7Un81tY96Ln?usp=sharing

Currently it's hard for me to tell if the code is training at all.
Here are few of the places I think are buggy:

The update() phase was stopped by KL divergence, which is currently commented.
Maybe I'm missing something calculating the joint loss when doing (loss_v + loss_pi).backward()
The mlp() example seems to be agnostic to how many dimensions are during the experience-collection phase, vs the update() phase, which I'm not sure I solved well using ac.step(torch.as_tensor(o[None], dtype=torch.float32, device=device))
Is it reasonable to expect the agent to train without frame-stacking ? P.S: I'm wondering if there's a benchmark of frame-stacking vs recurrent model and whether there's a comparison
Currently I do not apply an activation after the last layer of ImpalaCNN and wondering if I should add a Softmax? More generally, I'm not sure the code in ImpalaCNNActorCritic is right....
I brewed some of my own logic for sampling mini-batches in order to fit into memory/GPU which I'm not 100% sure for i in tqdm(range(buf.ptr * train_iters // batch_size)):

Would love to get some pointers on how to approach this, or if somebody is also up to the exercise and feels like Zoom pair-programming?

burgalon added 9 commits October 12, 2020 15:26

Imapala CNN with PPO shared value-function+policy layers

5c34b0e

tqdm auto

16965f5

fix batch sampling

c4de480

gpu

4f40e62

gpu fix

3f7abe3

gpu fix

a5945bf

gpu fix

e404b39

do not quit on KL divergence before 1 step

447cd16

stop KL for debugging

a67e163

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impala CNN + ProcGen #298

Impala CNN + ProcGen #298

burgalon commented Oct 12, 2020 •

edited

Loading

Impala CNN + ProcGen #298

Are you sure you want to change the base?

Impala CNN + ProcGen #298

Conversation

burgalon commented Oct 12, 2020 • edited Loading

burgalon commented Oct 12, 2020 •

edited

Loading