Indonesian Bank Statement PDF Parser

A high-performance Python module for extracting structured data from Indonesian bank statement PDFs supporting multiple bank formats.

Features

Multi-bank support - Extensible architecture for different Indonesian banks
Native PDF parsing - No OCR required, works with text-based PDFs
Fast processing - Under 1 second for multi-page statements
Structured output - Exports to CSV files and pandas DataFrames
Clean Python API - Both function-based and class-based interfaces
Type hints - Full type annotation support
CLI tool - Command-line interface for batch processing

Supported Banks

BRI (Bank Rakyat Indonesia) - Full support
Extensible - Easy to add support for other banks

Installation

From PyPI (recommended)

pip install indonesian-bank-statement-parser

From Source

# Clone the repository
git clone <repository-url>
cd pdf_parser

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On macOS/Linux
# or .\venv\Scripts\activate on Windows

# Install in development mode
pip install -e ".[dev]"

Quick Start

Python API

Auto-detection (recommended):

from pdfparser import parse_pdf

# Parse any supported bank statement
result = parse_pdf("statement.pdf")
result.export_to_csv("metadata.csv", "transactions.csv")

Using factory for auto-detection:

from pdfparser import ParserFactory

factory = ParserFactory()
result = factory.parse("statement.pdf")
print(f"Detected bank: {result.metadata.bank_name}")

Specific bank parser:

from pdfparser import BRIParser

parser = BRIParser()
result = parser.parse("bri_statement.pdf")

# Access parsed data
print(f"Bank: {result.metadata.bank_name}")
print(f"Statement Date: {result.metadata.statement_date}")
print(f"Found {len(result.transactions)} transactions")

# Export to CSV
result.export_to_csv("metadata.csv", "transactions.csv")

Access data as pandas DataFrame:

result = parse_pdf("statement.pdf")

metadata_df = result.get_metadata_df()
transactions_df = result.get_transactions_df()

print(transactions_df.head())

Check supported banks:

from pdfparser import get_supported_banks

print("Supported banks:", get_supported_banks())

Using the Factory Pattern

from pdfparser import ParserFactory

# Create factory instance
factory = ParserFactory()

# Auto-detect and parse
result = factory.parse("unknown_bank_statement.pdf")

# Get specific parser
parser = factory.get_parser("bri_statement.pdf")
result = parser.parse("bri_statement.pdf")

# List all supported banks
banks = factory.list_supported_banks()
print(f"Supported banks: {banks}")

Command Line

# Basic usage
bank-statement-parser statement.pdf

# Custom output paths
bank-statement-parser statement.pdf -o my_metadata.csv my_transactions.csv

# Verbose mode (shows parsed details)
bank-statement-parser statement.pdf --verbose

# Show help
bank-statement-parser --help

Output Structure

Metadata CSV

Field	Description
Statement Date	Date when the statement was generated
Transaction Period Start	Start date of the transaction period
Transaction Period End	End date of the transaction period
Account Number	Bank account number
Product Name	Type of account (e.g., Britama-IDR)
Currency	Transaction currency (e.g., IDR)
Business Unit	Branch/unit name
Bank Name	Name of the bank

Transactions CSV

Field	Description
Transaction Date	Date of the transaction (DD/MM/YY)
Transaction Time	Time of the transaction (HH:MM:SS)
Description	Full transaction description
Teller/User ID	Teller or system ID
Debit	Amount debited (0.00 if credit)
Credit	Amount credited (0.00 if debit)
Balance	Account balance after transaction

Adding Support for New Banks

The parser is designed to be easily extensible. See examples/mandiri_parser_example.py for a template on how to add support for additional banks.

Steps to add a new bank:

Create a new parser class inheriting from BaseBankParser
Implement the required methods (parse, can_parse, bank_name)
Add the parser to the factory in factory.py
Write tests for the new parser

Development

Setup Development Environment

# Clone and install in development mode
git clone <repository-url>
cd pdf_parser
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=pdfparser

# Run specific test file
pytest tests/test_parser.py

Code Quality

# Format code
black pdfparser/

# Check linting
flake8 pdfparser/

# Type checking
mypy pdfparser/

# Run all pre-commit hooks
pre-commit run --all-files

Requirements

Python 3.8+
pdfplumber >= 0.7.0
pandas >= 1.3.0

Documentation

Developer Guide - For contributors and developers
API Reference - Detailed API documentation

Contributing

Contributions are welcome! Please read the contributing guidelines and submit pull requests for any improvements.

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
pdfparser		pdfparser
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
cli.py		cli.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test_mypy.py		test_mypy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Indonesian Bank Statement PDF Parser

Features

Supported Banks

Installation

From PyPI (recommended)

From Source

Quick Start

Python API

Using the Factory Pattern

Command Line

Output Structure

Metadata CSV

Transactions CSV

Adding Support for New Banks

Steps to add a new bank:

Development

Setup Development Environment

Running Tests

Code Quality

Requirements

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

mojnomiya/pdf_parser

Folders and files

Latest commit

History

Repository files navigation

Indonesian Bank Statement PDF Parser

Features

Supported Banks

Installation

From PyPI (recommended)

From Source

Quick Start

Python API

Using the Factory Pattern

Command Line

Output Structure

Metadata CSV

Transactions CSV

Adding Support for New Banks

Steps to add a new bank:

Development

Setup Development Environment

Running Tests

Code Quality

Requirements

Documentation

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages