Do both BC and DT fit the training data well? #2

w-hc · 2021-06-04T18:34:29Z

Hi thanks for the interesting work!
A question here: how well do Behavior Cloning and Decision Transformer fit the training data (esp. when there is a mixture of policies, like the ones with replay data or medium + expert)? This doesn't seem to be reported in the paper. Do they fit the data (roughly) equally well?

kzl · 2021-06-05T05:26:10Z

Thanks for the question! I've attached some of the L2 losses for both. In short Decision Transformer fits the training data better across all datasets (a combination of return conditioning and longer context length).

kzl closed this as completed Jun 5, 2021

ivo-1 mentioned this issue Mar 31, 2022

More Training Information on Reacher #40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do both BC and DT fit the training data well? #2

Do both BC and DT fit the training data well? #2

w-hc commented Jun 4, 2021 •

edited

Loading

kzl commented Jun 5, 2021

Do both BC and DT fit the training data well? #2

Do both BC and DT fit the training data well? #2

Comments

w-hc commented Jun 4, 2021 • edited Loading

kzl commented Jun 5, 2021

w-hc commented Jun 4, 2021 •

edited

Loading