This project is an independent, community-driven tool and is NOT affiliated with, endorsed by, or in any way officially connected to:
- Parqet (parqet.com)
- Kasparund AG
- Liberty Vorsorge AG
- Saxo Bank CH
- Terzo Vorsorgestiftung (VIAC)
- Selma Finance AG
- N26 Bank
- Relai AG
- Any other financial institutions mentioned in this documentation
This is a personal project created to assist with importing financial data into portfolio tracking tools. Use at your own risk. The author(s) are not responsible for any errors, data loss, or financial decisions made based on the output of this tool.
Always verify the processed data against your official financial statements before making any investment decisions.
Parqet Parser is a comprehensive Python framework for processing financial documents from various Swiss banking institutions and converting them into a standardized CSV format compatible with Parqet, a portfolio tracking platform.
📄 For detailed information about the CSV format specification, please refer to the official Parqet CSV documentation.
- PDF-based: Kasparund AG, Liberty Vorsorge AG, Saxo Bank CH, Terzo Vorsorgestiftung (VIAC)
- CSV-based: Selma, N26 Bank, Relai (Bitcoin)
- Trades: Buy and sell transactions with full FX support
- Deposits/Withdrawals: Cash transfers in and out
- Dividends: Dividend payments with tax withholding
- Interest: Interest payments on cash positions
- Fees: Account maintenance fees and taxes
✅ Automated broker detection - Automatically identifies which broker a document belongs to ✅ Multi-format support - Processes both PDF and CSV files ✅ Smart deduplication - Uses unique transaction IDs to prevent duplicates ✅ Timezone normalization - Converts all timestamps to UTC with localized display ✅ Configuration-driven - IBAN-to-holding mappings via JSON config ✅ Type-safe - Full type hints and Pydantic validation ✅ Extensible - Plugin architecture for easy addition of new brokers ✅ Robust parsing - Advanced regex patterns with fallback strategies ✅ Comprehensive testing - Unit tests with pytest
- Python 3.8 or higher
- pip package manager
- Clone the repository:
git clone <repository-url>
cd parqet.parser- Install dependencies:
pip install -r requirements.txt- Create configuration file:
cp .env.example .env
cp config.json.example config.json # If example exists- Edit
config.jsonwith your IBAN-to-holding mappings:
{
"CH9876543210123456789": "hld_abc123...",
"DE1234567890123456789": "hld_xyz789..."
}Prerequisites:
- Docker and Docker Compose installed
- Financial documents (PDFs or CSVs) from supported brokers
Steps:
- Initial setup (first time only):
make install- Edit config.json with your IBAN-to-holding mappings:
{
"CH1234567890": "hld_your_holding_id_from_parqet"
}- Add your documents to
data/directory:
cp ~/Downloads/*.pdf data/- Process files:
make process- Check results:
ls -l output/The container runs once and exits automatically - no need to stop it manually!
If you prefer to run without Docker:
# Install dependencies
pip install -r requirements.txt
# Create config
echo '{}' > config.json
# Run
python -m app.mainCreate a .env file to customize paths and settings:
# Directory paths
PARQET_DATA_DIR=data
PARQET_OUTPUT_DIR=output
PARQET_LOG_DIR=logs
# Config file
PARQET_CONFIG_FILE=config.json
# Timezone
PARQET_TIMEZONE=Europe/Zurich
# Logging
PARQET_LOG_LEVEL=INFO
PARQET_LOG_TO_CONSOLE=truefrom app.main import main
from app.lib.brokers import KasparundBroker, SelmaBroker
# Process with specific brokers
brokers = [KasparundBroker(), SelmaBroker()]
main(brokers=brokers, data_dir="custom_data_dir")parqet.parser/
├── app/
│ ├── main.py # Entry point
│ ├── config.json # IBAN-to-holding mappings
│ ├── lib/
│ │ ├── brokers/ # Broker implementations
│ │ │ ├── base_broker.py # Abstract base class
│ │ │ ├── kasparund.py # Kasparund AG parser
│ │ │ ├── liberty.py # Liberty Vorsorge AG parser
│ │ │ ├── saxo.py # Saxo Bank CH parser
│ │ │ ├── terzo.py # Terzo Vorsorgestiftung parser
│ │ │ ├── selma.py # Selma CSV parser
│ │ │ ├── n26.py # N26 Bank CSV parser
│ │ │ └── relai.py # Relai Bitcoin parser
│ │ ├── data_types/ # Transaction processors
│ │ │ ├── trades.py
│ │ │ ├── deposits_withdrawals.py
│ │ │ ├── dividends.py
│ │ │ ├── interest.py
│ │ │ └── fees.py
│ │ └── common/ # Shared utilities
│ │ ├── env_config.py # Environment configuration
│ │ ├── config_utilities.py # Config loading
│ │ ├── datetime_utilities.py
│ │ ├── pdf_utilities.py
│ │ ├── csv_utilities.py
│ │ ├── pdf_parser.py # Advanced PDF parsing
│ │ ├── validation.py # Pydantic schemas
│ │ ├── exceptions.py # Custom exceptions
│ │ └── logger_utilities.py
├── tests/ # Test suite
│ ├── conftest.py
│ ├── test_config_utilities.py
│ ├── test_validation.py
│ └── test_pdf_parser.py
├── data/ # Input files (gitignored)
├── output/ # Generated CSVs (gitignored)
├── logs/ # Log files (gitignored)
├── .env # Environment config (gitignored)
├── .env.example # Environment template
├── requirements.txt
├── pytest.ini
├── mypy.ini
└── README.md
- File Detection: Main scans
data/directory for PDF and CSV files - Broker Identification: Each file is tested against broker detection logic
- Transaction Extraction: Broker-specific parsers extract raw transaction data
- Data Processing: Transactions are categorized and formatted
- Validation: Pydantic schemas validate transaction integrity
- CSV Output: Deduplicated and sorted data written to
output/directory - File Management: Processed PDFs are moved and renamed with metadata
Create a new file app/lib/brokers/mybroker.py:
from typing import Dict, List, Any, Optional
from app.lib.brokers.base_broker import BaseBroker
from app.lib.common.pdf_utilities import extract_text_from_pdf, validate_pdf
class MyBrokerConfig:
BROKER_NAME = "My Broker Name"
TARGET_DIRECTORY = "data/mybroker"
DATA_DEFINITIONS = {
"trade": {
"regex_patterns": {
"match": r"Buy|Sell",
"amount": r"Amount:\s*([\d.,]+)",
# ... more patterns
},
"fields": lambda tx, portfolio, holding_map: {
# ... field mapping
},
"process_function": process_trades,
}
}
class MyBroker(BaseBroker):
def detect(self, file_path: str, file_content: Optional[str] = None) -> bool:
"""Detect if file belongs to this broker."""
# Implement detection logic
return "MY BROKER" in file_content
def extract_transactions(self, file_path: str) -> Dict[str, List[Dict[str, Any]]]:
"""Extract transactions from file."""
# Implement extraction logic
pass
def process_transactions(
self, transactions: Dict[str, List[Dict[str, Any]]], file_path: Optional[str] = None
) -> Dict[str, List[Dict[str, Any]]]:
"""Process extracted transactions."""
# Implement processing logic
pass
def generate_output_file(self, category: str, file_path: str) -> str:
"""Generate output filename."""
return f"mybroker_{category}"Add to app/lib/brokers/__init__.py:
from app.lib.brokers.mybroker import MyBroker
__all__ = [..., "MyBroker"]Add to app/main.py:
from app.lib.brokers.mybroker import MyBroker
# In main() function
brokers = [
# ... existing brokers
MyBroker(),
]pytestpytest --cov=app --cov-report=htmlpytest tests/test_config_utilities.py -vmypy app/Each transaction gets a unique ID generated from:
datetimeidentifier(ISIN code)amounttypebrokerholding
This ensures:
- ✅ Duplicate detection across runs
- ✅ Idempotent processing (re-running doesn't create duplicates)
- ✅ Consistent IDs for the same transaction
Transaction IDs follow the format: txn_<16-char-hash>
Example: txn_a1b2c3d4e5f6g7h8
All output CSVs follow the Parqet CSV format specification:
transaction_id;datetime;date;time;type;broker;holding;currency;amount;shares;price;fee;tax;realizedgains;identifier;assettype;originalcurrency;fxrate;holdingname;holdingnickname;exchange;wkn;avgholdingperiod
txn_abc123;2024-03-15T06:30:00.000Z;2024-03-15;06:30:00;buy;Kasparund AG;hld_xyz789;CHF;1005,00;10;100,50;;;CH0012345678;;;;;;
Delimiter: Semicolon (;)
Decimal Separator: Comma (,)
Date Format: ISO 8601 (YYYY-MM-DDTHH:MM:SS.000Z)
Sorting: Descending by datetime (newest first)
For complete field descriptions and requirements, see the official Parqet CSV documentation.
Logs are written to three files in logs/ directory:
- general.log: INFO level and above
- error.log: ERROR level only
- debug.log: All DEBUG messages
Log format:
2024-03-15 10:30:45 - main - INFO - Processing file: data/kasparund_statement.pdf
Configure log level via environment variable:
PARQET_LOG_LEVEL=DEBUG # DEBUG, INFO, WARNING, ERRORSolution: Check if your PDF/CSV matches the broker's detection pattern. Enable DEBUG logging:
PARQET_LOG_LEVEL=DEBUG python -m app.mainSolution: Create config.json in the root directory with your IBAN mappings:
{
"CH1234567890": "hld_your_holding_id"
}Solution: Ensure ISIN codes in PDFs follow format: 2 letters + 9 alphanumeric + 1 digit (e.g., CH0012345678)
Solution: Ensure you run from the project root:
python -m app.main # ✅ Correct
python app/main.py # ❌ May cause import issues- Follow PEP 8
- Use type hints for all functions
- Write docstrings for public APIs
- Keep functions focused (max ~50 lines)
- Create test file:
tests/test_<module>.py - Use pytest fixtures from
conftest.py - Mark tests:
@pytest.mark.unit,@pytest.mark.integration - Run tests before committing
# Run tests
pytest
# Type check
mypy app/
# Format check (if using black)
black --check app/
# Lint (if using ruff)
ruff check app/- Sensitive Data: Financial documents contain sensitive personal information
- Local Processing: All processing happens locally on your machine
- No Data Transmission: This tool does NOT send your data to any external services
- Config File: Never commit your
config.jsonto version control (it's in.gitignore) - PDFs: Never commit your financial PDFs to version control
Best Practices:
- Use this tool on a secure, private computer
- Keep your
config.jsonand PDFs secure - Verify output CSVs before importing to Parqet
- Regularly clean up old documents from
data/directory
MIT License - See LICENSE file for details
Important: While this software is open source, you are solely responsible for:
- Verifying the accuracy of processed data
- Complying with your financial institutions' terms of service
- Any financial decisions made based on the output
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Getting Help:
- 📖 Read the full documentation: README.md, DOCKER.md
- 🐛 Found a bug? Open an issue on GitHub
- 💡 Have a feature request? Open an issue with the "enhancement" label
- 📚 Check existing issues before creating a new one
Contributing:
- See CONTRIBUTING.md for guidelines
- Pull requests are welcome!
- Please add tests for new features
Disclaimer: This is a community project. Support is provided on a best-effort basis by volunteers.
- ✅ Environment-based configuration
- ✅ Type hints and Pydantic validation
- ✅ Unique transaction IDs
- ✅ Advanced PDF parsing with fallback patterns
- ✅ Comprehensive test suite
- ✅ Custom exception hierarchy
- ✅ mypy type checking
- Initial implementation with basic broker support
- Regex-based PDF parsing
- CSV deduplication