-
Compared to the original paper Learning Off-Policy with Online Planning, I found some implementation differences.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Apologies for the delayed response, and thank you for your suggestions.
To unify different planning algorithms, we simplified the code and converted the original numpy implementation to a torch implementation for improved GPU efficiency. This approach is inspired by the implementation in TDMPC: |
Beta Was this translation helpful? Give feedback.
Apologies for the delayed response, and thank you for your suggestions.
The action sequences generated by actors can be found here: ARC Actors Action Sequence
The action sequences generated by historical distributions are available here: ARC Historical Distributions Action Sequence