SQL project analyzing e-commerce customer behavior using MySQL
Analyze fictional e‑commerce data in MySQL to showcase advanced SQL skills for data‑analyst roles.
| # | Business Question | SQL Techniques Used |
|---|---|---|
| 1 | Who are our most valuable customers? (CLV) | JOIN, SUM, GROUP BY |
| 2 | Which customers may churn (inactive 90+ days)? | MAX, DATEDIFF, HAVING |
| 3 | What are the top‑selling products each month? | Window functions RANK() |
| 4 | How well do signup cohorts retain over time? | CTEs, date functions |
ecommerce-sql-case-study/ ├── data/ ← 4 CSV files (generated with Python + Faker) ├── sql/ │ ─ schema.sql ← CREATE TABLE script │ ─ analysis_queries.sql ├ ─ screenshots/ Results
- MySQL 8.0 (queries run in MySQL Workbench or CLI)
- Git & GitHub for version control
- Python (only for dataset generation)
- Optional: Excel / Google Sheets / Tableau for quick charts
# 1 Clone the repo
git clone https://github.com/<your-username>/ecommerce-sql-case-study.git
cd ecommerce-sql-case-study
# 2 In MySQL Workbench:
# – Run sql/schema.sql (creates tables)
# – Import each CSV from data/ into its matching table
# 3 Execute queries from sql/analysis_queries.sql
-- Top 10 customers by lifetime value
SELECT c.customer_id,
ROUND(SUM(o.order_amount),2) AS lifetime_value
FROM customers c
JOIN orders o USING (customer_id)
WHERE o.status = 'Completed'
GROUP BY c.customer_id
ORDER BY lifetime_value DESC
LIMIT 10;-- Customers inactive > 90 days (churn risk)
SELECT c.customer_id,
MAX(o.order_date) AS last_order_date,
DATEDIFF(CURDATE(), MAX(o.order_date)) AS days_since_last_order
FROM customers c
JOIN orders o USING (customer_id)
GROUP BY c.customer_id
HAVING MAX(o.order_date) < CURDATE() - INTERVAL 90 DAY;
