This project contains a coding agent that automatically generates, tests, and self-fixes a Python parser for bank statement PDFs using the Google Gemini API.
The agent operates in a simple, robust loop:
-
Clone the Repository
git clone https://github.com/itz-Mayank/AI_Agent_Data_Parser.git cd AI_Agent_Data_Parser -
Install Dependencies Make sure you have Java installed. Then, install the required Python packages.
pip install -r requirements.txt
-
Set Your API Key Create a
.envfile in the root directory and add your Google Gemini API key:GOOGLE_API_KEY="YOUR_API_KEY_HERE" -
Add Sample Data Place your bank statement PDF in
data/icici/icic_sample.pdf. -
Run the Agent Execute the agent from your terminal, specifying the target bank.
python agent.py --target icici
The agent will begin the process of writing, testing, and fixing the parser, which will be saved in the
custom_parsers/directory.
/
├── agent.py # The main AI agent script
├── tests/
│ └── test_parser.py # Validation script to execute the generated parser
├── data/
│ └── icici/
│ └── icici_sample.pdf # Input PDF for a target bank
├── custom_parsers/
│ └── (Generated by the agent)
├── Output/
│ └── (Generated by the parser)
├── .env # For storing your API key
└── README.md
This agent leverages the power and speed of Google's Gemini family of models. The primary model used during development was gemini-2.5-flash latest.