An automated agent for debugging and fixing bugs in Python code using LLM (Qwen) and LangGraph.
This project implements an automated code debugging system based on "Qwen3-0.6B" LLM. The agent analyzes buggy code from the HumanEvalPack dataset and generates fixed code. After generation, the code is tested in isolated Docker environment.
buggy_agent/
├── main.py # Entry point
├── code_agent.py # LangGraph agent for code fixing
├── code_model.py # Qwen model wrapper
├── code_intepretor.py # Docker runner for code execution
├── evaluation.py # Evaluation system
├── prompts.py # LLM prompts
└── requirements.txt # Project dependencies
pip install -r requirements.txtpython main.py- Load Data: Load HumanEvalPack dataset with buggy code
- Prepare Prompt: Create prompt with task description, error type, and examples
- Generate Fixed code: LLM analyzes code and generates fixed version
- Testing: Fixed code runs with unit tests in Docker
- Evaluation: Calculate Pass@1 metric and detailed statistics