A high-performance Python module for extracting structured data from Indonesian bank statement PDFs supporting multiple bank formats.
- Multi-bank support - Extensible architecture for different Indonesian banks
- Native PDF parsing - No OCR required, works with text-based PDFs
- Fast processing - Under 1 second for multi-page statements
- Structured output - Exports to CSV files and pandas DataFrames
- Clean Python API - Both function-based and class-based interfaces
- Type hints - Full type annotation support
- CLI tool - Command-line interface for batch processing
- BRI (Bank Rakyat Indonesia) - Full support
- Extensible - Easy to add support for other banks
pip install indonesian-bank-statement-parser# Clone the repository
git clone <repository-url>
cd pdf_parser
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On macOS/Linux
# or .\venv\Scripts\activate on Windows
# Install in development mode
pip install -e ".[dev]"Auto-detection (recommended):
from pdfparser import parse_pdf
# Parse any supported bank statement
result = parse_pdf("statement.pdf")
result.export_to_csv("metadata.csv", "transactions.csv")Using factory for auto-detection:
from pdfparser import ParserFactory
factory = ParserFactory()
result = factory.parse("statement.pdf")
print(f"Detected bank: {result.metadata.bank_name}")Specific bank parser:
from pdfparser import BRIParser
parser = BRIParser()
result = parser.parse("bri_statement.pdf")
# Access parsed data
print(f"Bank: {result.metadata.bank_name}")
print(f"Statement Date: {result.metadata.statement_date}")
print(f"Found {len(result.transactions)} transactions")
# Export to CSV
result.export_to_csv("metadata.csv", "transactions.csv")Access data as pandas DataFrame:
result = parse_pdf("statement.pdf")
metadata_df = result.get_metadata_df()
transactions_df = result.get_transactions_df()
print(transactions_df.head())Check supported banks:
from pdfparser import get_supported_banks
print("Supported banks:", get_supported_banks())from pdfparser import ParserFactory
# Create factory instance
factory = ParserFactory()
# Auto-detect and parse
result = factory.parse("unknown_bank_statement.pdf")
# Get specific parser
parser = factory.get_parser("bri_statement.pdf")
result = parser.parse("bri_statement.pdf")
# List all supported banks
banks = factory.list_supported_banks()
print(f"Supported banks: {banks}")# Basic usage
bank-statement-parser statement.pdf
# Custom output paths
bank-statement-parser statement.pdf -o my_metadata.csv my_transactions.csv
# Verbose mode (shows parsed details)
bank-statement-parser statement.pdf --verbose
# Show help
bank-statement-parser --help| Field | Description |
|---|---|
| Statement Date | Date when the statement was generated |
| Transaction Period Start | Start date of the transaction period |
| Transaction Period End | End date of the transaction period |
| Account Number | Bank account number |
| Product Name | Type of account (e.g., Britama-IDR) |
| Currency | Transaction currency (e.g., IDR) |
| Business Unit | Branch/unit name |
| Bank Name | Name of the bank |
| Field | Description |
|---|---|
| Transaction Date | Date of the transaction (DD/MM/YY) |
| Transaction Time | Time of the transaction (HH:MM:SS) |
| Description | Full transaction description |
| Teller/User ID | Teller or system ID |
| Debit | Amount debited (0.00 if credit) |
| Credit | Amount credited (0.00 if debit) |
| Balance | Account balance after transaction |
The parser is designed to be easily extensible. See examples/mandiri_parser_example.py for a template on how to add support for additional banks.
- Create a new parser class inheriting from
BaseBankParser - Implement the required methods (
parse,can_parse,bank_name) - Add the parser to the factory in
factory.py - Write tests for the new parser
# Clone and install in development mode
git clone <repository-url>
cd pdf_parser
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install# Run all tests
pytest
# Run with coverage
pytest --cov=pdfparser
# Run specific test file
pytest tests/test_parser.py# Format code
black pdfparser/
# Check linting
flake8 pdfparser/
# Type checking
mypy pdfparser/
# Run all pre-commit hooks
pre-commit run --all-files- Python 3.8+
- pdfplumber >= 0.7.0
- pandas >= 1.3.0
- Developer Guide - For contributors and developers
- API Reference - Detailed API documentation
Contributions are welcome! Please read the contributing guidelines and submit pull requests for any improvements.
MIT License - see LICENSE file for details.