Ronen Huang
November 2021 to December 2021 (modified for Python usage), January 2025 to Present (add deep Q-Learning).
The Farkle Simulation can be downloaded by pip.
pip install farkle-simulation
The rules of Farkle can be seen in the In a Nutshell section of https://farkle.games/official-rules/. In this case, there are two players with Player 1 as the first player and Player 2 as the second player.
The scoring system can be seen in the table below
| Dice to Keep | Score |
|---|---|
| Three 1s or Straight | 1000 |
| Three Different Pairs | 750 |
| Three 6s | 600 |
| Three 5s | 500 |
| Three 4s | 400 |
| Three 3s | 300 |
| Three 2s | 200 |
| One 1 | 100 |
| One 5 | 50 |
A player begins their turn with all six dice.
- If there are scoring combinations, a player can choose any and
- Either keep rolling or stop
- When number of dice is 0, it resets to 6
- Otherwise, a player has "farkled" and the turn score is 0
Once a player reaches 10,000 points, they have won the game.
The implementation of the strategies is in strategy.py.
The strategies that Player 1 and Player 2 can use are naive strategy, simple RL strategy, and custom strategy (manual). The turn function goes through one turn.
from farkle_simulation.components.strategy import\
NaiveStrategy, SimpleRLStrategy, CustomStrategy
naive_strategy = NaiveStrategy()
simple_rl_strategy = SimpleRLStrategy()
custom_strategy = CustomStrategy()
naive_turn_score = naive_strategy.turn(current_score, advantage)
simple_rl_turn_score = simple_rl_strategy.turn(current_score, advantage)
custom_turn_score = custom_strategy.turn(current_score, advantage)The player chooses the action each roll via typing 1 to 7 on the command line.
For each roll
- Choose action that maximizes roll score
- Stop if number of dice is less than or equal to 2 and disadvantage of less than 1,000.
For each roll
- Choose action that maximizes reward based on
- Number of Rolls
# The gamma is reward factor decrease by roll. actual_reward = gamma ** np.log2(num_rolls + 1) * roll_score
- Distance to 10,000 and Current Score
# Increases as close to 10,000. distance_factor = distance_scale ** (current_score / 10000)
- Advantage
# Increase as more behind. advantage_factor = 1 if advantage < 0: advantage_factor *= max( np.emath.logn(advantage_scale, -advantage), 1 )
- Number of Rolls
The architecture of the deep Q-learning networks takes 11 dimensional input state
- Distance
- Advantage
- Current Turn Score
- Roll Maxes
and predicts reward for each of 7 possible actions
- Keep 1 to 6 dice
- Stop
This can be seen in architecture.py.
The training process can be seen at simple_farkle_rl.py. The best model state dictionary is saved as simple_action_reward_state_dict.pt.
from farkle_simulation.components.simple_farkle_rl import train
train()The plot of turn score by current score can be seen in training_simple_rl.jpg and the table in training_simple_rl.csv.

The implementation of the simulation is in simulation.py.
The convergence_plot function plots the expected probability Player 1 wins using some strategy with Player 2 using some other strategy.
from farkle_simulation.components.simulation import convergence_plot
convergence_plot(naive_strategy, simple_rl_strategy)
convergence_plot(simple_rl_strategy, naive_strategy)The convergence plots can be seen at convergence_naive_simple_rl.jpg
and convergence_simple_rl_naive.jpg 
The histogram function plots the average probability Player 1 wins using some strategy with Player 2 using some other strategy.
from farkle_simulation.components.simulation import histogram
histogram(naive_strategy, simple_rl_strategy)
histogram(simple_rl_strategy, naive_strategy)The histograms can be seen at histogram_naive_simple_rl.jpg
and histogram_simple_rl_naive.jpg 
To train the simple RL agent and compare the strategies through convergence plot and histogram.
from farkle_simulation.pipeline import train_simulate
train_simulate()To play a game against the simple RL agent.
from farkle_simulation.pipeline import play_game
play_game()Input 1 to play as first player and 2 to play as second player.
Player 1 (1) or Player (2)? 1Input action to chose for dice combination.
Dice Combination - (1, 1, 1, 1, 1, 4)
Legal Moves - (1) Keep 1 - max 100 remaining dice 5, (2) Keep 2 - max 200 remaining dice 4, (3) Keep 3 - max 1000 remaining dice 3, (4) Keep 4 - max 1100 remaining dice 2, (5) Keep 5 - max 1200 remaining dice 1, (7) No Roll - max 1200 remaining dice 0
Action - 7The output of action is.
Current Turn Score - 1200
Current Score - 1200
Turn Score - 1200
Player 1 Score - 1200 Player 2 Score - 0The output of a "farkle" is.
Dice Combination - (2, 6)
Farkled
Turn Score - 0Farkle Official Rules. (2025). Retrieved from https://farkle.games/official-rules/
Le Cam, L. (1986). The Central Limit Theorem Around 1935. Statistical Science, 1(1), 78–91. Retrieved from http://www.jstor.org/stable/2245503
Mnih, V., Kavukcuoglu, K., Silver, D. et al. (2015). Human-level control through deep reinforcement learning. Nature 518, 529–533. Retrieved from https://doi.org/10.1038/nature14236