Portfolio-grade operational analytics product built from:
- Olist e-commerce delivery intelligence: predict late deliveries and identify actionable drivers.
- PaySim finance add-on (optional): fraud classification + “risk” marts (Week 3+).
- Business problem: late deliveries drive dissatisfaction and support cost.
- Analytics output: operational slices + executive-ready visuals + a simple model for triage.
- Engineering output: reproducible pipeline that exports dashboard-ready datasets and reports.
notebooks/01_eda_olist.ipynb: Week 1 analysis notebook (learning + interpretation)notebooks/02_portfolio_story.ipynb: Week 2 presentation notebook (executive flow)src/pipelines/week2.py: Week 2 reproducible run (exports)reports/executive_summary.md: 1-page executive output (generated)docs/dashboard_recommendations.md: Tableau Public dashboard specdashboards/tableau/screenshots/: generated dashboard screenshots for GitHub and LinkedIn
raw CSV → processed Parquet → modeling dataset → models + slices → exports + executive summary + dashboard extract
- Overall late-delivery rate is 7.9%, with an on-time rate of 92.1%.
- Late orders are delayed by 9.6 days on average, making late delivery a customer-experience and support-risk problem.
- Higher-value orders and high freight-ratio orders show elevated late-delivery risk.
- Highest customer-state late-delivery rates include AL, MA, PI, CE, and SE.
- The model is best positioned as an operational triage signal, not an automated decision system.
python -m venv .venv
source .venv/Scripts/activate # Windows (Git Bash)
# source .venv/bin/activate # macOS/Linux
pip install -r requirements.txtPlace raw files here (not committed):
data/raw/olist/(all Olist CSVs)data/raw/paysim/(PaySim CSV; optional for Week 2)
Ingest:
python -m src.ingest.olist
python -m src.ingest.paysimThis generates metrics, tables, figures, dashboard extracts, and an executive summary.
./scripts/run_week2.shOutputs:
reports/metrics/late_delivery_model_metrics.csvreports/tables/late_delivery_by_customer_state.csvreports/figures/late_delivery_by_price_band.pngreports/exports/olist_dashboard_extract.parquet(dashboard contract)reports/exports/model_metrics.csv(Tableau Model Insights helper)reports/exports/feature_importance_top15.csv(Tableau tooltip/helper)reports/tableau/*.csv(chart-ready Tableau sheets)reports/executive_summary.md
- Published dashboard: E-Commerce Delivery Risk Intelligence
- Spec:
docs/dashboard_recommendations.md - Automation strategy:
docs/tableau_automation_strategy.md - Build guide:
dashboards/tableau/BUILD_GUIDE.md - Data:
reports/exports/olist_dashboard_extract.parquet - Faster Desktop start:
./scripts/open_tableau_package.sh - Portfolio assets:
docs/portfolio_launch_assets.md
- Built a reproducible Python analytics pipeline for 99K+ e-commerce orders, generating executive KPIs, operational risk slices, model metrics, Tableau-ready extracts, and portfolio screenshots.
- Published a Tableau Public dashboard showing late-delivery risk by price band, freight ratio, weekday, customer state, and customer-seller lanes.
- Framed ML output as decision support by comparing baseline and random forest models, documenting leakage guardrails, operational limitations, and business recommendations.


