Implementation of the Multi-Armed Bandit where each arm returns continuous numerical rewards. Covers Epsilon-Greedy, UCB1, and Thompson Sampling with detailed explanations.
thompson-sampling epsilon-greedy ucb upper-confidence-bounds contextual-bandits mab linucb linearucb multiarmed-bandits
-
Updated
Apr 23, 2025 - Jupyter Notebook