Skip to content

Devanik21/general-gamer-ai-lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

25 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽฎ Universal RL Game Arena

Python Streamlit License Status

One Brain, 10 Games | AlphaZero-Inspired Multi-Game AI

A universal reinforcement learning agent trained across 10 diverse board games with interactive gameplay

Features โ€ข Quick Start โ€ข Games โ€ข Architecture โ€ข Play Online


๐ŸŽฏ Overview

Universal RL Arena is an interactive platform showcasing a single AI agent that masters 10 different board games through Q-learning with minimax and MCTS enhancements. Unlike traditional game-specific AI, this universal agent learns transferable strategic patterns across games, from simple Tic-Tac-Toe to complex Ultimate Tic-Tac-Toe.

๐Ÿ† Key Features

  • Universal Agent: Single Q-table architecture for all 10 games
  • Interactive Gameplay: Human vs AI, AI vs AI battle mode
  • In-App Training: Train custom agents with adjustable hyperparameters
  • Real-Time Visualization: Dynamic game state rendering
  • Performance Analytics: Training stats and win-rate tracking
  • Model Persistence: Save/load trained agents as .zip archives

๐Ÿš€ Quick Start

Installation

# Clone repository
git clone https://github.com/Devanik21/universal-rl-arena.git
cd universal-rl-arena

# Install dependencies
pip install -r requirements.txt

requirements.txt:

streamlit>=1.28.0
numpy>=1.21.0
matplotlib>=3.5.0
pandas>=1.5.0

Launch Application

streamlit run aGI.py

The app will open at http://localhost:8501


๐ŸŽฎ Supported Games

Game Complexity State Space Strategy Type
Tic-Tac-Toe Simple 3ยณ Tactical
Connect-4 Medium 7โถ Positional
Nim Simple Exponential Mathematical
Hexapawn Simple 3ยณ Tactical
Chomp Medium 4ร—6 Strategic
Sim Medium C(6,2) edges Graph Theory
Dots & Boxes Medium 3ร—3 grid Territory Control
Breakthrough Complex 6ร—6 board Positional
Gomoku Complex 7ร—7 board Pattern Recognition
Ultimate Tic-Tac-Toe Very Complex 9ร—3ยณ Multi-level Strategy

Game Rules Summary

  • Tic-Tac-Toe: First to get 3 in a row wins
  • Connect-4: First to connect 4 discs vertically/horizontally/diagonally wins
  • Nim: Player forced to take the last object loses
  • Hexapawn: Reach opponent's back row or block all enemy moves
  • Chomp: Avoid eating the poison square (bottom-left)
  • Sim: First to form a triangle in their color loses
  • Dots & Boxes: Claim the most boxes by completing squares
  • Breakthrough: First to reach opponent's back row wins
  • Gomoku: Get exactly 5 stones in a row
  • Ultimate Tic-Tac-Toe: Win small boards to claim meta-board positions

๐Ÿง  Architecture

Universal Agent Design

class UniversalAgent:
    def __init__(self, player_id, lr=0.01, gamma=0.99, 
                 epsilon=1.0, mcts_sims=50, minimax_depth=2):
        self.q_table = {}  # Shared across all games
        self.game_stats = {}

Core Components:

  1. State Representation: (game_name, *flattened_board_state)
  2. Q-Table: {(state, action): value} mapping
  3. Action Selection: Epsilon-greedy with tactical checks
  4. Learning: Temporal Difference (TD) updates

Learning Algorithm

Q-Learning Update Rule:

$$Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma \max_{a'} Q(s',a') - Q(s,a)]$$

Where:

  • $\alpha$ = learning rate (default: 0.01)
  • $\gamma$ = discount factor (default: 0.99)
  • $r$ = immediate reward
  • $s'$ = next state

Tactical Enhancements:

# 1. Immediate win detection
for action in available_actions:
    if sim_move(action).winner == self.player_id:
        return action

# 2. Block opponent wins
for action in available_actions:
    if sim_move(action, opponent).winner == opponent:
        return action

# 3. Q-value maximization
return argmax_a Q(state, action)

Hyperparameter Configuration

Parameter Default Range Purpose
lr 0.01 0.001-0.5 Learning speed
gamma 0.99 0.8-0.999 Future reward weight
epsilon 1.0โ†’0.01 - Exploration rate (decays)
epsilon_decay 0.998 0.95-0.999 Exploration reduction
minimax_depth 2 1-6 Search tree depth
mcts_simulations 50 10-500 Monte Carlo rollouts

๐Ÿ“Š Training Pipeline

Multi-Game Training Loop

# Initialize agents
agent1 = UniversalAgent(player_id=1, lr=0.01, gamma=0.99)
agent2 = UniversalAgent(player_id=2, lr=0.01, gamma=0.99)

# Games to train
games = [TicTacToe(), Nim(), Connect4(), Hexapawn(), 
         Chomp(), Sim(), DotsAndBoxes(), Breakthrough(),
         Gomoku(), UltimateTicTacToe()]

# Self-play training
for game in games:
    for episode in range(episodes):
        play_game(game, agent1, agent2, training=True)
        agent1.decay_epsilon()
        agent2.decay_epsilon()

Training Results

Typical convergence after 200 episodes per game:

Metric Value
Total Q-States ~50,000-100,000
Training Time (10 games, 200 eps) ~2-5 minutes
Final Epsilon 0.01
Win Rate (vs random) >85%

๐ŸŽจ Visualization System

All games feature custom matplotlib renderers:

  • Tic-Tac-Toe: X/O symbols with grid
  • Connect-4: Colored discs with gravity
  • Nim: Stacked token pyramids
  • Hexapawn: Chess pawn symbols
  • Chomp: Chocolate grid with poison marker
  • Sim: Graph with 6 vertices
  • Dots & Boxes: Grid with edge highlighting
  • Breakthrough: Chess-like board
  • Gomoku: Go-style board
  • Ultimate TTT: 3ร—3 meta-board with active board highlighting

Example rendering code:

def visualize_game(env):
    if env.name == "tictactoe":
        return visualize_tictactoe(env.board)
    # ... routing for all 10 games

๐Ÿ’พ Model Persistence

Save/Load Format

Agents are serialized to .zip archives containing:

universal_agent.zip
โ”œโ”€โ”€ agent1.json       # Player 1 Q-table & config
โ”œโ”€โ”€ agent2.json       # Player 2 Q-table & config
โ””โ”€โ”€ config.json       # Game list & metadata

JSON Structure:

{
  "q_table": {
    "[['tictactoe', 0, 0, 0, ...], '(0, 0)']": 0.85
  },
  "player_id": 1,
  "epsilon": 0.01,
  "game_stats": {
    "tictactoe": {"wins": 120, "losses": 75, "draws": 5}
  },
  "lr": 0.01,
  "gamma": 0.99
}

Usage

# Save trained agents
zip_buffer = create_universal_zip(agent1, agent2)
with open("my_agent.zip", "wb") as f:
    f.write(zip_buffer.getvalue())

# Load agents
agent1, agent2, config = load_universal_agents("my_agent.zip")

๐ŸŽฏ Usage Guide

1. Upload Pre-Trained Agent

Sidebar โ†’ Upload Universal Agent โ†’ Select .zip file โ†’ Load

2. Watch AI Battle

Select Game โ†’ Watch Battle โ†’ Auto-play/Step Mode

3. Play Against AI

Human vs AI โ†’ Choose Agent โ†’ Click board positions

4. Train New Agent

Training Lab โ†’ Set Hyperparameters โ†’ Start Multi-Game Training

5. Adjust AI Difficulty

Sidebar โ†’ AI Difficulty โ†’ Minimax Depth (1-6) & MCTS Sims (10-500)

๐Ÿ”ฌ Performance Optimization

State Space Reduction

  • Canonical Forms: Rotations/reflections mapped to single state
  • Pruning: Invalid actions filtered before Q-lookup
  • Sparse Storage: Only visited states stored in Q-table

Computational Tricks

# Fast win detection (vectorized)
def _check_win(self, player):
    # Row/column checks
    for i in range(3):
        if all(board[i, :] == player): return True
    # Diagonal checks
    if all(np.diag(board) == player): return True

Memory Efficiency

  • States stored as tuples (immutable, hashable)
  • Actions converted to strings for Q-table keys
  • Numpy arrays for board representations

๐Ÿ“ˆ Future Enhancements

  • Neural network policy (DQN/A3C)
  • Transfer learning metrics
  • Multi-agent tournament mode
  • Online multiplayer (WebRTC)
  • Performance benchmarking suite
  • Additional games (Chess variants, Go)

๐Ÿ› ๏ธ Technical Stack

Component Technology
Framework Streamlit 1.28+
ML/RL Custom Q-Learning
Visualization Matplotlib
State Management Streamlit Session State
Serialization JSON + ZIP
Data NumPy, Pandas

๐ŸŒ Deployment

Streamlit Cloud

# Push to GitHub
git push origin main

# Deploy via Streamlit Cloud
# 1. Visit share.streamlit.io
# 2. Connect repository: Devanik21/universal-rl-arena
# 3. Set main file: aGI.py
# 4. Deploy

Local Docker

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "aGI.py"]
docker build -t universal-rl .
docker run -p 8501:8501 universal-rl

๐Ÿ“š Educational Use

Perfect for teaching:

  • Reinforcement Learning: Q-learning, exploration/exploitation
  • Game Theory: Minimax, Nash equilibria
  • Algorithm Design: State representation, search strategies
  • Python Programming: OOP, numpy, visualization

Example classroom exercise:

# Students implement a new game
class MyGame:
    def __init__(self): ...
    def reset(self): ...
    def get_state(self): ...
    def get_available_actions(self): ...
    def make_move(self, action): ...

๐Ÿค Contributing

Contributions welcome! Areas for improvement:

  1. New Games: Add games with get_state() interface
  2. Visualizations: Enhance rendering quality
  3. Algorithms: Implement A3C/PPO/DQN variants
  4. UI/UX: Improve Streamlit interface
  5. Documentation: Add tutorials/videos

See CONTRIBUTING.md for guidelines.


๐Ÿ“œ License

MIT License - see LICENSE


๐Ÿ‘ค Author

Devanik


๐Ÿ™ Acknowledgments

Inspired by:

  • AlphaZero (DeepMind) - Universal game-playing architecture
  • DQN (Mnih et al., 2015) - Deep Q-learning foundations
  • OpenAI Gym - Environment interface design

Built with โค๏ธ using Streamlit


๐Ÿ“Š Stats

GitHub stars GitHub forks GitHub watchers


Made for Genius-Level Play ๐ŸŽฎ

One Brain. Ten Games. Infinite Possibilities.

About

A specialized Reinforcement Learning (RL) project focused on multi-task mastery across 10 distinct gaming environments. General-Gamer-AI-Lite implements a lightweight multi-task agent designed to learn shared representations and transfer knowledge between varied game mechanics, from classic arcade challenges to strategic grid worlds.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors