A 2D Chihuahua-themed reinforcement learning environment built with Pygame and Gymnasium.
2D Pygame game and environment for reinforcement learning. It includes:
- A custom RL environment compatible with Gymnasium
- Modular game structure using Pygame (clean architecture)
- Object-Oriented Programming (OOP) for organizing game logic
Check out my another work in progress project: Car Game.
- Discrete Reinforcement Learning environment
- Collision-based rewards: hitting a target tile
- Partial rewards: getting the ball near to the target tile
- Frame-based animation, sprites, basic physics and level loading
- Easy to integrate with Gym-based RL pipelines
- Playable characters with simple animations (render states)
The project is organized into modular components, separating game logic, assets, environment definitions, and training scripts:
โโโ assets/ # Game assets (sprites, tiles)
โ โโโ images/ # Character and tile images
โ โโโ tiles/ # Grass, wall, and ring textures
โโโ engine/ # Core game engine and state manager
โ โโโ game_engine.py # Game loop and scene handling
โโโ entities/ # Game entities like the character, ball, and target
โโโ gym_ext/ # Gym-compatible environment and training script
โ โโโ env.py # Gymnasium environment wrapper
โ โโโ train.py # PPO or other RL agent training logic
โโโ logic/ # Game physics and collision handling
โโโ logs/ # TensorBoard logs for training visualization
โโโ map/ # Level definitions (JSON) and level loader
โโโ models/ # (Optional) Trained models can be saved here
โโโ screens/ # UI screens (menu, gameplay, win state)
โโโ utils/ # Constants, game states, shared helpers- Python 3.7+
- uv (recommended)
- modules: pygame, gymnasium, tensorboard (see requirements.txt)
-
Clone the repo:
git clone https://github.com/uma-dev/chihuahua-game.git cd chihuahua-game -
(Optional) Create a virtual environment:
uv venv source .venv/bin/activate # Linux/macOS .\.venv\Scripts\activate # Windows
-
Install dependencies:
uv pip install -r requirements.txt
Play the game with: (Use arrow keys (โ โ โ โ) to move your chihuahua. Enjoy!)
python game.pyTrain the model and observe the training process with:
python main.py train_render Train the model without render:
python main.py train_no_render Eval the policy with:
python main.py evalLook training in real time with:
tensorboard --logdir .\logs\ppo_tensorboard\Chihuahua 2D videogame where the agent controls a chihuahua character that must hit and push a ball toward a target tile. The environment is built with Pygame and wrapped using Gymnasium API, so its compatible with RL algorithms.
The agent's objective is to learn how to move and act efficiently to guide the ball to the target, maximizing the reward and minimizing steps (penalty).
- Algorithm: PPO (Proximal Policy Optimization)
- Policy: MLP (Multilayer Perceptron)
- Library: Stable-Baselines3
- Logging: TensorBoard (
logs/ppo_tensorboard/)
State Space (Observation): A vector of 10 values:
[char_x, char_y, char_vx, char_vy, ball_x, ball_y, ball_vx, ball_vy, char_jumping, char_sprinting]Action Space: Discrete(5):
0 = No-op, 1 = Left, 2 = Right, 3 = Jump, 4 = SprintReward Function: (work in progress, as it directly changes the learning curve)
if ball_hits_target(self.ball, self.target): # win condition
reward += 10.0
# An episode only ends when the target is hitted
done = True
else:
# penalty for each step
reward -= 0.01Training metrics are logged with TensorBoard, so you can monitor the agent's behavior over time:
tensorboard --logdir logs/This shows the sum of all rewards per episode, as training progresses:
- Initial plateau: The agent explores randomly without reward shaping.
- Reward drop: Movement/jump penalties and progress bonus in the reward avoid this behaivor (?).
- Late improvement: Reward increases as the agent learns to interact better.
โ ๏ธ Still in progress: refining the state representation (possibly using CNNs) and improving the reward function.
This plot shows how long episodes last (in steps), averaged over evaluations:
- A strange behavior is observed at first, possibly due to the agent exploring or stalling.
- Eventually, the episode length drops, which aligns with increasing reward โ the agent finishes the task more efficiently.
- The minimum mean length reached is ~1727 steps.
โ ๏ธ When evaluating the policy, consider this number as a typical episode length.
- Duration: ~2.5 hours
- Steps: 500,000
- Eval Interval: Every 5,000 steps
(Note: Since episodes last ~1.7k steps, more than one episode might occur per eval.)
Partial rewards can introduce unintended behaviors during training. In this blooper, the reward setup was wrong. Aiming to improve learning curves, I gave partial rewards to agent: either when moving the ball closer to the target or when hitting the ball. However, these partial rewards per step were greater than the penalty for each step. So, the total reward per step was positive!
As a result, the agent learned to exploit this by only looking for partial rewards, so the episode mean also grows as is show in the results:
โ ๏ธ Looks ok, but take a loot to the episode lenght.
โ ๏ธ The result of the reward pitfall. The agent learned to look for partial rewards.
Define JSON layout files in the levels/ folder
Add custom tilemaps, special blocks, or enemies
Contributions are welcome! Please:
- Fork this repository.
- Create a feature branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m "Add my feature" - Push to the branch:
git push origin feature/my-feature - Open a pull request.





