PyPatch 🐛

title	PyPatch
emoji	🐛
colorFrom	red
colorTo	yellow
sdk	docker
pinned	false
license	mit

PyPatch 🐛

An OpenEnv-compliant RL environment for AI-powered Python code debugging.

PyPatch challenges AI agents to identify and fix bugs in Python code across three difficulty levels. Reward is proportional to the fraction of hidden test cases the agent's fix passes — giving rich, continuous feedback for reinforcement learning.

Live links:

Hugging Face Space: PyPatch Space
GitHub Repository: Anurag-M1/PyPatch
Judge UI: /ui

Why PyPatch

Clean difficulty ladder from syntax repair to algorithmic debugging
Dense partial-credit reward rather than brittle pass/fail scoring
Deterministic task graders with explicit metadata
FastAPI environment with clear OpenEnv-style interaction flow
Baseline inference that emits structured logs and uses the evaluator proxy

Environment Overview

Property	Value
Tasks	3 (easy → medium → hard)
Max steps per task	5
Reward range	0.0 – 1.0
Termination	All tests pass OR max steps reached
API	OpenEnv-compliant REST (FastAPI)
Judge UI	`/ui`
Task metadata	Includes `grader` and non-boundary baseline `score`

Tasks

Task 1 — Fix Factorial Function (Easy)

The agent receives a factorial function with syntax errors (missing colon, wrong assignment operator in condition).
Reward: fraction of 4 test cases passing.

Task 2 — Fix Second Largest Element (Medium)

The agent receives a function with a logic error that returns the largest element instead of the second largest.
Reward: fraction of 5 test cases passing.

Task 3 — Fix Binary Search Algorithm (Hard)

The agent receives a binary search with 4 distinct algorithmic bugs: off-by-one boundary, wrong while condition, and two incorrect pointer updates.
Reward: fraction of 7 test cases passing.

Action & Observation Spaces

Observation

{
  "task_id": "task_hard_binary_search",
  "task_name": "Fix Binary Search Algorithm",
  "difficulty": "hard",
  "task_description": "...",
  "buggy_code": "def binary_search(arr, target):\n    ...",
  "error_hint": "Hint: There are 4 bugs — check ...",
  "step_count": 1
}

Action

{
  "fixed_code": "def binary_search(arr, target):\n    ..."
}

API Endpoints

Method	Path	Description
`GET`	`/`	Health check
`GET`	`/ui`	Judge-facing landing page
`GET`	`/health`	Health check
`GET`	`/tasks`	List all tasks
`POST`	`/reset`	Start episode (optional: `{"task_id": "..."}`)
`POST`	`/step`	Submit fix `{"fixed_code": "..."}`
`GET`	`/state`	Current episode state

Local Setup

git clone https://github.com/Anurag-M1/PyPatch
cd PyPatch

python3 -m pip install -r requirements.txt

python3 -m uvicorn main:app --host 0.0.0.0 --port 7860

# Optional: in another terminal — run inference
export API_BASE_URL=https://your-litellm-proxy.example/v1
export API_KEY=your_proxy_key
export MODEL_NAME=gpt-4o-mini
export ENV_URL=http://localhost:7860
python3 inference.py

Open locally:

http://127.0.0.1:7860/ui
http://127.0.0.1:7860/docs

Docker

docker build -t pypatch .
docker run -p 7860:7860 pypatch

Reward Design

Partial credit is intentional. If an agent fixes the syntax error in Task 3 but misses two algorithmic bugs, it might pass 4/7 test cases and receive reward 0.57. This dense feedback signal makes PyPatch suitable for RL training, not just one-shot evaluation.

Baseline Behavior

inference.py is designed to satisfy evaluation constraints safely:

emits [START], [STEP], and [END] blocks to stdout
uses API_BASE_URL and API_KEY when provided by the evaluator
produces per-task scores that are strictly between 0 and 1
keeps behavior deterministic for reproducibility

Required Environment Variables (inference)

Variable	Description
`API_BASE_URL`	Evaluator-provided LiteLLM proxy URL
`API_KEY`	Evaluator-provided proxy API key
`MODEL_NAME`	Model identifier
`ENV_URL`	PyPatch server URL

Built for OpenEnv Round 1 — April 2026

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
server		server
Dockerfile		Dockerfile
README.md		README.md
inference.py		inference.py
main.py		main.py
models.py		models.py
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
tasks.py		tasks.py
uv.lock		uv.lock
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyPatch 🐛

Why PyPatch

Environment Overview

Tasks

Task 1 — Fix Factorial Function (Easy)

Task 2 — Fix Second Largest Element (Medium)

Task 3 — Fix Binary Search Algorithm (Hard)

Action & Observation Spaces

Observation

Action

API Endpoints

Local Setup

Docker

Reward Design

Baseline Behavior

Required Environment Variables (inference)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PyPatch 🐛

Why PyPatch

Environment Overview

Tasks

Task 1 — Fix Factorial Function (Easy)

Task 2 — Fix Second Largest Element (Medium)

Task 3 — Fix Binary Search Algorithm (Hard)

Action & Observation Spaces

Observation

Action

API Endpoints

Local Setup

Docker

Reward Design

Baseline Behavior

Required Environment Variables (inference)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages