classic_examples/
contains some reproductions of the multi-armed bandit problem examples from "Reinforcement Learning An Introduction" by Sutton and Barto (Chapter 2.3)
gumbel/
contains an implementation of Algorithm 2 from the paper Policy Improvement By Planning With Gumbel