Closed
Description
Currently there is no documentation for how to implement a new algorithm. Does there exist a draft of this somewhere? Or can you quickly outline the main steps involved? I'm considering implementing something like AlphaZero, so based on MCTS with additionally learned models.
So would I also frame this as a QBasedPolicy
that seems to be used in most of the currently available algorithms in ReinforcementLearningZoo
or implement something else like a TreeBasedPolicy
?
Metadata
Metadata
Assignees
Labels
No labels