This project is a small multi-agent "tag" environment implemented with Pygame and PyTorch. Two types of agents interact: Seekers (try to catch) and Hiders (try to avoid). Agents are controlled by small neural networks (Q-learning) and trained inside the play.py training loop.
- agent_hider.py, agent_seeker.py — trainer/controller classes for each agent (policy, memory, training loop calls)
- model_hider.py, model_seeker.py — simple feed-forward
Linear_QNetandQTrainer(MSE loss, Adam) - game.py — environment (map, agents, rewards, collision detection, drawing)
- raycast.py — agent class that builds radars and draws/rotates sprites
- play.py — top-level training loop
- settings.py — constants and hyperparameters
- assets/ — images and map files (map1.png, seeker.png, hider.png)
- model/ — saved model weights (created at runtime)
Use Python 3 (3.8+ tested). Create a virtual environment and install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtRunning
Graphical (open a window):
python3 ./play.pyHeadless (no display) and log output to a file (good for long training runs on servers):
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 python3 ./play.py | tee train_2000.logExplanation of that command:
SDL_VIDEODRIVER=dummy: run Pygame without opening a windowMAX_EPISODES=2000: stop after 2000 episodes (optional variable read byplay.py)| tee train_2000.log: write terminal output totrain_2000.logwhile displaying it
Useful variants
- Run in background and save stdout/stderr to file:
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 nohup python3 ./play.py > train_2000.log 2>&1 &- Watch logs in real time:
tail -f train_2000.logConfiguration
Most runtime hyperparameters live in settings.py:
- Screen / visuals:
SCREEN_WIDTH,SCREEN_HEIGHT,AGENT_SCALE - Agent spawn:
HIDERS_POS,SEEKERS_POS(format(x,y)or(x,y,angle)) - Epsilon schedule for exploration:
EPS_START(default 1.0)EPS_MIN(default 0.05)EPS_TARGET_EPISODES(default 5000) — number of episodes to reachEPS_MINfromEPS_START(decay is computed automatically)
Model saving/loading
- Models are saved into
./model/by callingmodel.save()(this saves the PyTorchstate_dict). - Current saving policy (in
play.py):- Seeker model saved when seeker achieves a new record
score(filenamemodel_seeker.pthby default). - Hider model saved every 10 episodes (filename
model_hider.pthby default).
- Seeker model saved when seeker achieves a new record
- At trainer initialization the code attempts to load existing weights automatically from these filenames (or common alternates). If a model file exists, you'll see a
Loaded ...message in the console. - If you prefer full checkpoints (model + optimizer + metadata), modify saving/loading in
model_*.pyandplay.pyto store a dict with'model_state'and'optimizer_state'.
Reward & training logic (what agents are rewarded/penalized for)
Key reward rules are implemented in game.py (summary of current setup):
AI_Tag is a small multi-agent environment where Seekers try to catch Hiders. The project uses Pygame for the environment and PyTorch for simple Q-learning agents. Agents receive radar-like observations and learn using a small feed-forward network.
- Two agent types: Seeker and Hider
- Simple Q-learning update (neural network approximator)
- Headless training support for servers
- Automatic model save/load (state_dict) into
./model/
agent_hider.py,agent_seeker.py— trainer/controller logic for each agentmodel_hider.py,model_seeker.py— model (Linear_QNet) and trainer (QTrainer)game.py— environment, reward logic, collision detection, renderingraycast.py— agent class (radars, drawing, rotation)play.py— main training loopsettings.py— configuration and hyperparametersassets/— images and map filesmodel/— saved model weights (created at runtime)
Python 3.8+ and the dependencies listed in requirements.txt.
Create an environment and install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtGraphical (GUI):
python3 ./play.pyHeadless (no display):
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 python3 ./play.pyExplanation:
SDL_VIDEODRIVER=dummyruns Pygame without opening a window (useful on servers).MAX_EPISODES(optional env var) limits training episodes.teewrites output to both terminal andtrain_2000.log.
Run in background:
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 nohup python3 ./play.py > train_2000.log 2>&1 &Watch logs:
tail -f train_2000.logEdit settings.py to configure:
- Display and scale:
SCREEN_WIDTH,SCREEN_HEIGHT,AGENT_SCALE - Spawn points:
HIDERS_POS,SEEKERS_POS(each can be(x, y)or(x, y, angle)). - Exploration schedule:
EPS_START(default1.0)EPS_MIN(default0.05)EPS_TARGET_EPISODES(default5000) — number of episodes to reachEPS_MINfromEPS_START(decay computed automatically)
- Models are saved as PyTorch
state_dictin./model/. - Default filenames used by the trainers:
model_seeker.pth,model_hider.pth(the code also accepts common alternates). - On trainer initialization the code attempts to load saved weights automatically and will print
Loaded ...if successful. - To persist full training state (optimizer, episode counters), extend the saving logic to store a checkpoint dict with
'model_state'and'optimizer_state'.
Implemented (see game.py):
-
Seekers:
- Wall collision:
-10anddone - Catch (dist < 30px):
+100anddone(incrementsscore) - Near a hider (dist < 200px):
+0.1small positive - Else:
0
- Wall collision:
-
Hiders:
- Wall collision:
-10anddone - Caught (dist < 30px):
-10anddone - Near seeker (dist < 200px):
-1(penalty to discourage approach) - Else:
+0.1small positive for staying far
- Wall collision:
Training uses a Q-target: Q_new = reward if done else reward + gamma * max_a Q(next_state,a); loss is MSE.
- If agents repeatedly die on walls, check spawn positions in
settings.pyandassets/map1.pngfor overlaps. - For longer exploration use larger
EPS_TARGET_EPISODESor higherEPS_MIN. - Use headless mode for faster, unattended training.
- Consider curriculum training: start on empty maps and progressively add obstacles.
Track moving averages (e.g. over 100 episodes) for:
- Seeker score per episode
- Hider average survival time
- Average reward per step
- Percentage of episodes with at least one capture
Add logging or save intermediate metrics to CSV for plotting.
- Add checkpointing with optimizer state for exact resume.
- Implement multiple maps and domain randomization.
- Improve agent networks, replay strategy, or use actor-critic algorithms for smoother training.
# GUI run
python3 ./play.py
# Headless 2000 episodes, save log
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 python3 ./play.py | tee train_2000.log
# Background
SDL_VIDEODRIVER=dummy MAX_EPISODES=2000 nohup python3 ./play.py > train_2000.log 2>&1 &
# Tail log
tail -f train_2000.logIf you want me to change defaults (for example set EPS_TARGET_EPISODES=2000), add automatic periodic checkpoints, or implement a curriculum, tell me which option you prefer and I'll apply it.