How would you implement a minimax q-learner with coax?
Hi there! I love the package and how accessible it is to relative newbies. The tutorials are pretty great and the accompanying videos are very helpful!
I was wondering what the best way to implement a minimax algorithm would be, would you recommend using two policies pi1 and pi2? Or is there something better suited for this?
I'd like to re-implement something like this old blogpost of mine in coax to get a better feel of the library.
Any help would be greatly appreciated :)
How would you implement a minimax q-learner with coax?
Hi there! I love the package and how accessible it is to relative newbies. The tutorials are pretty great and the accompanying videos are very helpful!
I was wondering what the best way to implement a minimax algorithm would be, would you recommend using two policies pi1 and pi2? Or is there something better suited for this?
I'd like to re-implement something like this old blogpost of mine in coax to get a better feel of the library.
Any help would be greatly appreciated :)