Skip to content

Train your custom sprite-based character to hit the correct tile using RL

Notifications You must be signed in to change notification settings

uma-dev/chihuahua-game

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

64 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Chihuahua Game ๐Ÿ•

A 2D Chihuahua-themed reinforcement learning environment built with Pygame and Gymnasium.


๐Ÿš€ Overview

2D Pygame game and environment for reinforcement learning. It includes:

  • A custom RL environment compatible with Gymnasium
  • Modular game structure using Pygame (clean architecture)
  • Object-Oriented Programming (OOP) for organizing game logic

Check out my another work in progress project: Car Game.


๐Ÿ“‹ Table of Contents

  1. Features
  2. Prerequisites
  3. Installation
  4. Usage
  5. RL Setup
  6. Creating Custom Levels
  7. Contributing

๐ŸŽฎ Features

  • Discrete Reinforcement Learning environment
  • Collision-based rewards: hitting a target tile
  • Partial rewards: getting the ball near to the target tile
  • Frame-based animation, sprites, basic physics and level loading
  • Easy to integrate with Gym-based RL pipelines
  • Playable characters with simple animations (render states)

Digitalization

Project structure

The project is organized into modular components, separating game logic, assets, environment definitions, and training scripts:

โ”œโ”€โ”€ assets/              # Game assets (sprites, tiles)
โ”‚   โ””โ”€โ”€ images/          # Character and tile images
โ”‚       โ””โ”€โ”€ tiles/       # Grass, wall, and ring textures

โ”œโ”€โ”€ engine/              # Core game engine and state manager
โ”‚   โ””โ”€โ”€ game_engine.py   # Game loop and scene handling

โ”œโ”€โ”€ entities/            # Game entities like the character, ball, and target

โ”œโ”€โ”€ gym_ext/             # Gym-compatible environment and training script
โ”‚   โ”œโ”€โ”€ env.py           # Gymnasium environment wrapper
โ”‚   โ””โ”€โ”€ train.py         # PPO or other RL agent training logic

โ”œโ”€โ”€ logic/               # Game physics and collision handling

โ”œโ”€โ”€ logs/                # TensorBoard logs for training visualization

โ”œโ”€โ”€ map/                 # Level definitions (JSON) and level loader

โ”œโ”€โ”€ models/              # (Optional) Trained models can be saved here

โ”œโ”€โ”€ screens/             # UI screens (menu, gameplay, win state)

โ””โ”€โ”€ utils/               # Constants, game states, shared helpers

๐Ÿ”ง Prerequisites

  • Python 3.7+
  • uv (recommended)
  • modules: pygame, gymnasium, tensorboard (see requirements.txt)

โš™๏ธ Installation

  1. Clone the repo:

    git clone https://github.com/uma-dev/chihuahua-game.git
    cd chihuahua-game
  2. (Optional) Create a virtual environment:

    uv venv 
    source .venv/bin/activate  # Linux/macOS
    .\.venv\Scripts\activate  # Windows
  3. Install dependencies:

    uv pip install -r requirements.txt

โ–ถ๏ธ Usage

Play the game with: (Use arrow keys (โ† โ†‘ โ†’ โ†“) to move your chihuahua. Enjoy!)

python game.py

Train the model and observe the training process with:

python main.py train_render 

Train the model without render:

python main.py train_no_render 

Eval the policy with:

python main.py eval

Look training in real time with:

 tensorboard --logdir .\logs\ppo_tensorboard\

๐Ÿง  RL Setup

Description and goal

Chihuahua 2D videogame where the agent controls a chihuahua character that must hit and push a ball toward a target tile. The environment is built with Pygame and wrapped using Gymnasium API, so its compatible with RL algorithms.

The agent's objective is to learn how to move and act efficiently to guide the ball to the target, maximizing the reward and minimizing steps (penalty).

Executing Game.py

Problem Formulation

  • Algorithm: PPO (Proximal Policy Optimization)
  • Policy: MLP (Multilayer Perceptron)
  • Library: Stable-Baselines3
  • Logging: TensorBoard (logs/ppo_tensorboard/)

State Space (Observation): A vector of 10 values:

[char_x, char_y, char_vx, char_vy, ball_x, ball_y, ball_vx, ball_vy, char_jumping, char_sprinting]

Action Space: Discrete(5):

0 = No-op, 1 = Left, 2 = Right, 3 = Jump, 4 = Sprint

Reward Function: (work in progress, as it directly changes the learning curve)

if ball_hits_target(self.ball, self.target):  # win condition
  reward += 10.0
  # An episode only ends when the target is hitted
  done = True
else:
  # penalty for each step
  reward -= 0.01

Training metrics are logged with TensorBoard, so you can monitor the agent's behavior over time:

tensorboard --logdir logs/

Total Reward per Episode

This shows the sum of all rewards per episode, as training progresses:

Reward

  • Initial plateau: The agent explores randomly without reward shaping.
  • Reward drop: Movement/jump penalties and progress bonus in the reward avoid this behaivor (?).
  • Late improvement: Reward increases as the agent learns to interact better.

โš ๏ธ Still in progress: refining the state representation (possibly using CNNs) and improving the reward function.


Mean Episode Length

This plot shows how long episodes last (in steps), averaged over evaluations:

Reward

  • A strange behavior is observed at first, possibly due to the agent exploring or stalling.
  • Eventually, the episode length drops, which aligns with increasing reward โ€” the agent finishes the task more efficiently.
  • The minimum mean length reached is ~1727 steps.

โš ๏ธ When evaluating the policy, consider this number as a typical episode length.


๐Ÿ•’ Training stats

  • Duration: ~2.5 hours
  • Steps: 500,000
  • Eval Interval: Every 5,000 steps
    (Note: Since episodes last ~1.7k steps, more than one episode might occur per eval.)

๐Ÿ™Š Bloopers

Reward pitfall

Partial rewards can introduce unintended behaviors during training. In this blooper, the reward setup was wrong. Aiming to improve learning curves, I gave partial rewards to agent: either when moving the ball closer to the target or when hitting the ball. However, these partial rewards per step were greater than the penalty for each step. So, the total reward per step was positive!

As a result, the agent learned to exploit this by only looking for partial rewards, so the episode mean also grows as is show in the results:

Reward pitfall

โš ๏ธ Looks ok, but take a loot to the episode lenght.

Episode lenght pitfall

โš ๏ธ The result of the reward pitfall. The agent learned to look for partial rewards.



๐Ÿ› ๏ธ Custom Levels

Define JSON layout files in the levels/ folder

Add custom tilemaps, special blocks, or enemies


๐Ÿค Contributing

Contributions are welcome! Please:

  1. Fork this repository.
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m "Add my feature"
  4. Push to the branch: git push origin feature/my-feature
  5. Open a pull request.

About

Train your custom sprite-based character to hit the correct tile using RL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages