MiniMax Algorithm?

How would you implement a minimax q-learner with coax?

Hi there! I love the package and how accessible it is to relative newbies. The tutorials are pretty great and the accompanying videos are very helpful!

I was wondering what the best way to implement a minimax algorithm would be, would you recommend using two policies pi1 and pi2? Or is there something better suited for this?

I'd like to re-implement something like [this old blogpost of mine](https://blog.flaport.net/reinforcement-learning-part-2.html) in coax to get a better feel of the library.

Any help would be greatly appreciated :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiniMax Algorithm? #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MiniMax Algorithm? #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions