Skip to content

AnserAli57/BoostCredit

Repository files navigation

BoostCredit ETL Pipeline

Setup

  1. Start PostgreSQL database:
make setup

Or manually:

docker-compose up -d postgres-db

Running the Pipeline

Local Execution (Python directly)

CSV Mode

make run-csv

With custom file:

make run-csv FILE=myfile.csv STORE_KEY=my_data

JSON Mode

make run-json

With custom file:

make run-json FILE=myfile.json STORE_KEY=my_data

Docker Execution (Container)

The ETL pipeline container automatically waits for PostgreSQL to be ready before starting.

CSV Mode

make run-csv-docker

JSON Mode

make run-json-docker

Build the container:

make build

Database Management

Start database:

make db-up

Stop database:

make db-down

View logs:

make logs

Testing

Run tests:

make test

Project Structure

BoostCredit/
├── data/              # Input data files (CSV/JSON)
├── output/            # Object store (Parquet files)
├── logs/              # Pipeline logs
├── src/
│   ├── extractors.py  # CSV/JSON extractors
│   ├── transformers.py # Data transformers
│   ├── loaders.py      # SQL loader
│   ├── storage.py      # Object store
│   ├── pipeline.py     # ETL pipeline
│   └── utils/
│       ├── logger.py
│       ├── pii_masking.py
│       └── transform_helpers.py
├── tests/              # Unit tests
├── main.py            # Entry point
├── docker-compose.yml # PostgreSQL and ETL pipeline services
├── Dockerfile         # ETL pipeline container
├── setup.sh           # Environment variables
└── Makefile           # Commands

Environment Variables

Set via setup.sh:

  • DB_HOST - Database host (default: localhost)
  • DB_PORT - Database port (default: 5432)
  • DB_USER - Database user (default: etl_user)
  • DB_PASSWORD - Database password (default: etl_password)
  • DB_NAME - Database name (default: etl_database)

About

Assesment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •