Skip to content

oliveira-mtcode/e-commerce-purchase-intent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ecommerce Purchase Intent Prediction & Product Recommendations

A complete, from‑scratch demo system that simulates e‑commerce activity, predicts purchase intent at the session level, and generates product recommendations. It includes an end‑to‑end Python pipeline and a Streamlit app to explore live results.

What this project does

  • Simulates realistic browsing/transactions with session‑level events and a product catalog
  • Cleans data, runs a concise EDA, and engineers user/product/time features
  • Trains a purchase‑intent model (XGBoost or LightGBM) with cross‑validation
  • Builds recommenders: collaborative filtering (Surprise/SVD) and content‑based (SentenceTransformers)
  • Orchestrates everything with a simple pipeline and serves a Streamlit demo
  • Reports metrics (OOF AUC for classification). Recommendation metrics hooks are easy to extend.

Project structure

.
├─ app/
│  └─ streamlit_app.py           # Frontend demo
├─ artifacts/
│  ├─ figures/                   # Plots (created at run)
│  ├─ metrics/                   # Metrics JSON/CSVs
│  └─ models/                    # Trained model(s)
├─ data/
│  ├─ raw/                       # Simulated catalog/sessions
│  └─ processed/                 # Post‑clean/feature data
├─ scripts/
│  └─ run_pipeline.py            # Run end‑to‑end pipeline
├─ src/
│  ├─ data/
│  │  ├─ processing.py           # cleaning, EDA, features
│  │  └─ __init__.py
│  ├─ models/
│  │  ├─ intent.py               # intent training + CV
│  │  └─ __init__.py
│  ├─ pipeline/
│  │  ├─ orchestrator.py         # simulate→process→train→reco
│  │  └─ __init__.py
│  ├─ reco/
│  │  ├─ collab.py               # Surprise SVD CF
│  │  ├─ content.py              # SentenceTransformers content‑based
│  │  └─ __init__.py
│  ├─ simulation/
│  │  ├─ data_generator.py       # catalog + session simulator
│  │  └─ __init__.py
│  └─ utils/
│     ├─ config.py               # Paths manager
│     ├─ logging_utils.py        # Logger helper
│     └─ __init__.py
└─ requirements.txt

How to run

  • End‑to‑end pipeline (simulate, process, train, recommend):
python scripts/run_pipeline.py
  • Streamlit app (first run will auto‑train if needed):
streamlit run app/streamlit_app.py

Workflow (high level)

  1. Data Simulation
    • Build a catalog with brand/category/price/rating and simple text.
    • Generate sessions: events, dwell times, segment/geo, and purchase outcomes with controllable conversion rate.
  2. Cleaning & EDA
    • Basic sanitization, timestamp parsing, and summary stats by segment/geo.
  3. Feature Engineering
    • Numerical features (events, dwell, price stats, tenure) and time features (hour, day‑of‑week) + one‑hot categorical encodings.
  4. Model Training
    • XGBoost or LightGBM with Stratified K‑Fold; reports Out‑of‑Fold (OOF) AUC; saves final model.
  5. Recommendations
    • Collaborative filtering (Surprise SVD) on implicit feedback derived from sessions.
    • Content‑based using SentenceTransformers embeddings of product text for similar‑item suggestions.
  6. Serving (Demo)
    • Streamlit app shows recent sessions for a user, intent scores, and a simple popularity‑weighted top‑K list.

Metrics & Visualizations

  • Classification: OOF AUC reported in console/logs (extend with ROC curves, calibration plots, feature importance, etc.).
  • Recommendation: Precision@K/Recall@K scaffolding can be added by holding out some interactions and scoring the recommenders.

Notes & Extensibility

  • The simulator is parameterized; tweak catalog size, sessions, conversion rate in ECommerceSimulator configs.
  • Swap in different intent models or hyperparameter search (Optuna/Sklearn GridSearch).
  • Add image/text enrichment to catalog, or integrate LightFM for hybrid recommenders.
  • For real‑time serving, wrap the trained model in a small API (Flask/FastAPI) and cache recent embeddings for speed.

Disclaimer

This is a compact educational demo. The data is simulated and not representative of a real store’s nuances.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages