- Start PostgreSQL database:
make setupOr manually:
docker-compose up -d postgres-dbmake run-csvWith custom file:
make run-csv FILE=myfile.csv STORE_KEY=my_datamake run-jsonWith custom file:
make run-json FILE=myfile.json STORE_KEY=my_dataThe ETL pipeline container automatically waits for PostgreSQL to be ready before starting.
make run-csv-dockermake run-json-dockerBuild the container:
make buildStart database:
make db-upStop database:
make db-downView logs:
make logsRun tests:
make testBoostCredit/
├── data/ # Input data files (CSV/JSON)
├── output/ # Object store (Parquet files)
├── logs/ # Pipeline logs
├── src/
│ ├── extractors.py # CSV/JSON extractors
│ ├── transformers.py # Data transformers
│ ├── loaders.py # SQL loader
│ ├── storage.py # Object store
│ ├── pipeline.py # ETL pipeline
│ └── utils/
│ ├── logger.py
│ ├── pii_masking.py
│ └── transform_helpers.py
├── tests/ # Unit tests
├── main.py # Entry point
├── docker-compose.yml # PostgreSQL and ETL pipeline services
├── Dockerfile # ETL pipeline container
├── setup.sh # Environment variables
└── Makefile # Commands
Set via setup.sh:
DB_HOST- Database host (default: localhost)DB_PORT- Database port (default: 5432)DB_USER- Database user (default: etl_user)DB_PASSWORD- Database password (default: etl_password)DB_NAME- Database name (default: etl_database)