One Brain, 10 Games | AlphaZero-Inspired Multi-Game AI
A universal reinforcement learning agent trained across 10 diverse board games with interactive gameplay
Features โข Quick Start โข Games โข Architecture โข Play Online
Universal RL Arena is an interactive platform showcasing a single AI agent that masters 10 different board games through Q-learning with minimax and MCTS enhancements. Unlike traditional game-specific AI, this universal agent learns transferable strategic patterns across games, from simple Tic-Tac-Toe to complex Ultimate Tic-Tac-Toe.
- Universal Agent: Single Q-table architecture for all 10 games
- Interactive Gameplay: Human vs AI, AI vs AI battle mode
- In-App Training: Train custom agents with adjustable hyperparameters
- Real-Time Visualization: Dynamic game state rendering
- Performance Analytics: Training stats and win-rate tracking
- Model Persistence: Save/load trained agents as .zip archives
# Clone repository
git clone https://github.com/Devanik21/universal-rl-arena.git
cd universal-rl-arena
# Install dependencies
pip install -r requirements.txtrequirements.txt:
streamlit>=1.28.0
numpy>=1.21.0
matplotlib>=3.5.0
pandas>=1.5.0
streamlit run aGI.pyThe app will open at http://localhost:8501
| Game | Complexity | State Space | Strategy Type |
|---|---|---|---|
| Tic-Tac-Toe | Simple | 3ยณ | Tactical |
| Connect-4 | Medium | 7โถ | Positional |
| Nim | Simple | Exponential | Mathematical |
| Hexapawn | Simple | 3ยณ | Tactical |
| Chomp | Medium | 4ร6 | Strategic |
| Sim | Medium | C(6,2) edges | Graph Theory |
| Dots & Boxes | Medium | 3ร3 grid | Territory Control |
| Breakthrough | Complex | 6ร6 board | Positional |
| Gomoku | Complex | 7ร7 board | Pattern Recognition |
| Ultimate Tic-Tac-Toe | Very Complex | 9ร3ยณ | Multi-level Strategy |
- Tic-Tac-Toe: First to get 3 in a row wins
- Connect-4: First to connect 4 discs vertically/horizontally/diagonally wins
- Nim: Player forced to take the last object loses
- Hexapawn: Reach opponent's back row or block all enemy moves
- Chomp: Avoid eating the poison square (bottom-left)
- Sim: First to form a triangle in their color loses
- Dots & Boxes: Claim the most boxes by completing squares
- Breakthrough: First to reach opponent's back row wins
- Gomoku: Get exactly 5 stones in a row
- Ultimate Tic-Tac-Toe: Win small boards to claim meta-board positions
class UniversalAgent:
def __init__(self, player_id, lr=0.01, gamma=0.99,
epsilon=1.0, mcts_sims=50, minimax_depth=2):
self.q_table = {} # Shared across all games
self.game_stats = {}Core Components:
- State Representation:
(game_name, *flattened_board_state) - Q-Table:
{(state, action): value}mapping - Action Selection: Epsilon-greedy with tactical checks
- Learning: Temporal Difference (TD) updates
Q-Learning Update Rule:
Where:
-
$\alpha$ = learning rate (default: 0.01) -
$\gamma$ = discount factor (default: 0.99) -
$r$ = immediate reward -
$s'$ = next state
Tactical Enhancements:
# 1. Immediate win detection
for action in available_actions:
if sim_move(action).winner == self.player_id:
return action
# 2. Block opponent wins
for action in available_actions:
if sim_move(action, opponent).winner == opponent:
return action
# 3. Q-value maximization
return argmax_a Q(state, action)| Parameter | Default | Range | Purpose |
|---|---|---|---|
lr |
0.01 | 0.001-0.5 | Learning speed |
gamma |
0.99 | 0.8-0.999 | Future reward weight |
epsilon |
1.0โ0.01 | - | Exploration rate (decays) |
epsilon_decay |
0.998 | 0.95-0.999 | Exploration reduction |
minimax_depth |
2 | 1-6 | Search tree depth |
mcts_simulations |
50 | 10-500 | Monte Carlo rollouts |
# Initialize agents
agent1 = UniversalAgent(player_id=1, lr=0.01, gamma=0.99)
agent2 = UniversalAgent(player_id=2, lr=0.01, gamma=0.99)
# Games to train
games = [TicTacToe(), Nim(), Connect4(), Hexapawn(),
Chomp(), Sim(), DotsAndBoxes(), Breakthrough(),
Gomoku(), UltimateTicTacToe()]
# Self-play training
for game in games:
for episode in range(episodes):
play_game(game, agent1, agent2, training=True)
agent1.decay_epsilon()
agent2.decay_epsilon()Typical convergence after 200 episodes per game:
| Metric | Value |
|---|---|
| Total Q-States | ~50,000-100,000 |
| Training Time (10 games, 200 eps) | ~2-5 minutes |
| Final Epsilon | 0.01 |
| Win Rate (vs random) | >85% |
All games feature custom matplotlib renderers:
- Tic-Tac-Toe: X/O symbols with grid
- Connect-4: Colored discs with gravity
- Nim: Stacked token pyramids
- Hexapawn: Chess pawn symbols
- Chomp: Chocolate grid with poison marker
- Sim: Graph with 6 vertices
- Dots & Boxes: Grid with edge highlighting
- Breakthrough: Chess-like board
- Gomoku: Go-style board
- Ultimate TTT: 3ร3 meta-board with active board highlighting
Example rendering code:
def visualize_game(env):
if env.name == "tictactoe":
return visualize_tictactoe(env.board)
# ... routing for all 10 gamesAgents are serialized to .zip archives containing:
universal_agent.zip
โโโ agent1.json # Player 1 Q-table & config
โโโ agent2.json # Player 2 Q-table & config
โโโ config.json # Game list & metadata
JSON Structure:
{
"q_table": {
"[['tictactoe', 0, 0, 0, ...], '(0, 0)']": 0.85
},
"player_id": 1,
"epsilon": 0.01,
"game_stats": {
"tictactoe": {"wins": 120, "losses": 75, "draws": 5}
},
"lr": 0.01,
"gamma": 0.99
}# Save trained agents
zip_buffer = create_universal_zip(agent1, agent2)
with open("my_agent.zip", "wb") as f:
f.write(zip_buffer.getvalue())
# Load agents
agent1, agent2, config = load_universal_agents("my_agent.zip")Sidebar โ Upload Universal Agent โ Select .zip file โ Load
Select Game โ Watch Battle โ Auto-play/Step Mode
Human vs AI โ Choose Agent โ Click board positions
Training Lab โ Set Hyperparameters โ Start Multi-Game Training
Sidebar โ AI Difficulty โ Minimax Depth (1-6) & MCTS Sims (10-500)
- Canonical Forms: Rotations/reflections mapped to single state
- Pruning: Invalid actions filtered before Q-lookup
- Sparse Storage: Only visited states stored in Q-table
# Fast win detection (vectorized)
def _check_win(self, player):
# Row/column checks
for i in range(3):
if all(board[i, :] == player): return True
# Diagonal checks
if all(np.diag(board) == player): return True- States stored as tuples (immutable, hashable)
- Actions converted to strings for Q-table keys
- Numpy arrays for board representations
- Neural network policy (DQN/A3C)
- Transfer learning metrics
- Multi-agent tournament mode
- Online multiplayer (WebRTC)
- Performance benchmarking suite
- Additional games (Chess variants, Go)
| Component | Technology |
|---|---|
| Framework | Streamlit 1.28+ |
| ML/RL | Custom Q-Learning |
| Visualization | Matplotlib |
| State Management | Streamlit Session State |
| Serialization | JSON + ZIP |
| Data | NumPy, Pandas |
# Push to GitHub
git push origin main
# Deploy via Streamlit Cloud
# 1. Visit share.streamlit.io
# 2. Connect repository: Devanik21/universal-rl-arena
# 3. Set main file: aGI.py
# 4. DeployFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "aGI.py"]docker build -t universal-rl .
docker run -p 8501:8501 universal-rlPerfect for teaching:
- Reinforcement Learning: Q-learning, exploration/exploitation
- Game Theory: Minimax, Nash equilibria
- Algorithm Design: State representation, search strategies
- Python Programming: OOP, numpy, visualization
Example classroom exercise:
# Students implement a new game
class MyGame:
def __init__(self): ...
def reset(self): ...
def get_state(self): ...
def get_available_actions(self): ...
def make_move(self, action): ...Contributions welcome! Areas for improvement:
- New Games: Add games with
get_state()interface - Visualizations: Enhance rendering quality
- Algorithms: Implement A3C/PPO/DQN variants
- UI/UX: Improve Streamlit interface
- Documentation: Add tutorials/videos
See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE
Devanik
- GitHub: @Devanik21
- LinkedIn: linkedin.com/in/devanik
- Twitter/X: @devanik2005
Inspired by:
- AlphaZero (DeepMind) - Universal game-playing architecture
- DQN (Mnih et al., 2015) - Deep Q-learning foundations
- OpenAI Gym - Environment interface design
Built with โค๏ธ using Streamlit
Made for Genius-Level Play ๐ฎ
One Brain. Ten Games. Infinite Possibilities.