Skip to content

Sfabi28/AI_tag

Repository files navigation

AI_Tag — Reinforcement Learning Tag Game

Overview

This project is a small multi-agent "tag" environment implemented with Pygame and PyTorch. Two types of agents interact: Seekers (try to catch) and Hiders (try to avoid). Agents are controlled by small neural networks (Q-learning) and trained inside the play.py training loop.

Repository structure

  • agent_hider.py, agent_seeker.py — trainer/controller classes for each agent (policy, memory, training loop calls)
  • model_hider.py, model_seeker.py — simple feed-forward Linear_QNet and QTrainer (MSE loss, Adam)
  • game.py — environment (map, agents, rewards, collision detection, drawing)
  • raycast.py — agent class that builds radars and draws/rotates sprites
  • play.py — top-level training loop
  • settings.py — constants and hyperparameters
  • assets/ — images and map files (map1.png, seeker.png, hider.png)
  • model/ — saved model weights (created at runtime)

Requirements

Use Python 3 (3.8+ tested). Create a virtual environment and install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Running

Graphical (open a window):

python3 ./play.py

Headless (no display) and log output to a file (good for long training runs on servers):

SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 python3 ./play.py | tee train_2000.log

Explanation of that command:

  • SDL_VIDEODRIVER=dummy: run Pygame without opening a window
  • MAX_EPISODES=2000: stop after 2000 episodes (optional variable read by play.py)
  • | tee train_2000.log: write terminal output to train_2000.log while displaying it

Useful variants

  • Run in background and save stdout/stderr to file:
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 nohup python3 ./play.py > train_2000.log 2>&1 &
  • Watch logs in real time:
tail -f train_2000.log

Configuration

Most runtime hyperparameters live in settings.py:

  • Screen / visuals: SCREEN_WIDTH, SCREEN_HEIGHT, AGENT_SCALE
  • Agent spawn: HIDERS_POS, SEEKERS_POS (format (x,y) or (x,y,angle))
  • Epsilon schedule for exploration:
    • EPS_START (default 1.0)
    • EPS_MIN (default 0.05)
    • EPS_TARGET_EPISODES (default 5000) — number of episodes to reach EPS_MIN from EPS_START (decay is computed automatically)

Model saving/loading

  • Models are saved into ./model/ by calling model.save() (this saves the PyTorch state_dict).
  • Current saving policy (in play.py):
    • Seeker model saved when seeker achieves a new record score (filename model_seeker.pth by default).
    • Hider model saved every 10 episodes (filename model_hider.pth by default).
  • At trainer initialization the code attempts to load existing weights automatically from these filenames (or common alternates). If a model file exists, you'll see a Loaded ... message in the console.
  • If you prefer full checkpoints (model + optimizer + metadata), modify saving/loading in model_*.py and play.py to store a dict with 'model_state' and 'optimizer_state'.

Reward & training logic (what agents are rewarded/penalized for)

Key reward rules are implemented in game.py (summary of current setup):

AI_Tag — Reinforcement Learning Tag Game

Overview

AI_Tag is a small multi-agent environment where Seekers try to catch Hiders. The project uses Pygame for the environment and PyTorch for simple Q-learning agents. Agents receive radar-like observations and learn using a small feed-forward network.

Features

  • Two agent types: Seeker and Hider
  • Simple Q-learning update (neural network approximator)
  • Headless training support for servers
  • Automatic model save/load (state_dict) into ./model/

Repository structure

  • agent_hider.py, agent_seeker.py — trainer/controller logic for each agent
  • model_hider.py, model_seeker.py — model (Linear_QNet) and trainer (QTrainer)
  • game.py — environment, reward logic, collision detection, rendering
  • raycast.py — agent class (radars, drawing, rotation)
  • play.py — main training loop
  • settings.py — configuration and hyperparameters
  • assets/ — images and map files
  • model/ — saved model weights (created at runtime)

Requirements

Python 3.8+ and the dependencies listed in requirements.txt.

Create an environment and install dependencies:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Running

Graphical (GUI):

python3 ./play.py

Headless (no display):

SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 python3 ./play.py

Explanation:

  • SDL_VIDEODRIVER=dummy runs Pygame without opening a window (useful on servers).
  • MAX_EPISODES (optional env var) limits training episodes.
  • tee writes output to both terminal and train_2000.log.

Run in background:

SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 nohup python3 ./play.py > train_2000.log 2>&1 &

Watch logs:

tail -f train_2000.log

Configuration

Edit settings.py to configure:

  • Display and scale: SCREEN_WIDTH, SCREEN_HEIGHT, AGENT_SCALE
  • Spawn points: HIDERS_POS, SEEKERS_POS (each can be (x, y) or (x, y, angle)).
  • Exploration schedule:
    • EPS_START (default 1.0)
    • EPS_MIN (default 0.05)
    • EPS_TARGET_EPISODES (default 5000) — number of episodes to reach EPS_MIN from EPS_START (decay computed automatically)

Model saving & loading

  • Models are saved as PyTorch state_dict in ./model/.
  • Default filenames used by the trainers: model_seeker.pth, model_hider.pth (the code also accepts common alternates).
  • On trainer initialization the code attempts to load saved weights automatically and will print Loaded ... if successful.
  • To persist full training state (optimizer, episode counters), extend the saving logic to store a checkpoint dict with 'model_state' and 'optimizer_state'.

Rewards summary

Implemented (see game.py):

  • Seekers:

    • Wall collision: -10 and done
    • Catch (dist < 30px): +100 and done (increments score)
    • Near a hider (dist < 200px): +0.1 small positive
    • Else: 0
  • Hiders:

    • Wall collision: -10 and done
    • Caught (dist < 30px): -10 and done
    • Near seeker (dist < 200px): -1 (penalty to discourage approach)
    • Else: +0.1 small positive for staying far

Training uses a Q-target: Q_new = reward if done else reward + gamma * max_a Q(next_state,a); loss is MSE.

Tips & suggestions

  • If agents repeatedly die on walls, check spawn positions in settings.py and assets/map1.png for overlaps.
  • For longer exploration use larger EPS_TARGET_EPISODES or higher EPS_MIN.
  • Use headless mode for faster, unattended training.
  • Consider curriculum training: start on empty maps and progressively add obstacles.

Monitoring & metrics

Track moving averages (e.g. over 100 episodes) for:

  • Seeker score per episode
  • Hider average survival time
  • Average reward per step
  • Percentage of episodes with at least one capture

Add logging or save intermediate metrics to CSV for plotting.

Extending the project

  • Add checkpointing with optimizer state for exact resume.
  • Implement multiple maps and domain randomization.
  • Improve agent networks, replay strategy, or use actor-critic algorithms for smoother training.

Common commands

# GUI run
python3 ./play.py

# Headless 2000 episodes, save log
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 python3 ./play.py | tee train_2000.log

# Background
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 nohup python3 ./play.py > train_2000.log 2>&1 &

# Tail log
tail -f train_2000.log

If you want me to change defaults (for example set EPS_TARGET_EPISODES=2000), add automatic periodic checkpoints, or implement a curriculum, tell me which option you prefer and I'll apply it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages