A complete, from‑scratch demo system that simulates e‑commerce activity, predicts purchase intent at the session level, and generates product recommendations. It includes an end‑to‑end Python pipeline and a Streamlit app to explore live results.
- Simulates realistic browsing/transactions with session‑level events and a product catalog
- Cleans data, runs a concise EDA, and engineers user/product/time features
- Trains a purchase‑intent model (XGBoost or LightGBM) with cross‑validation
- Builds recommenders: collaborative filtering (Surprise/SVD) and content‑based (SentenceTransformers)
- Orchestrates everything with a simple pipeline and serves a Streamlit demo
- Reports metrics (OOF AUC for classification). Recommendation metrics hooks are easy to extend.
.
├─ app/
│ └─ streamlit_app.py # Frontend demo
├─ artifacts/
│ ├─ figures/ # Plots (created at run)
│ ├─ metrics/ # Metrics JSON/CSVs
│ └─ models/ # Trained model(s)
├─ data/
│ ├─ raw/ # Simulated catalog/sessions
│ └─ processed/ # Post‑clean/feature data
├─ scripts/
│ └─ run_pipeline.py # Run end‑to‑end pipeline
├─ src/
│ ├─ data/
│ │ ├─ processing.py # cleaning, EDA, features
│ │ └─ __init__.py
│ ├─ models/
│ │ ├─ intent.py # intent training + CV
│ │ └─ __init__.py
│ ├─ pipeline/
│ │ ├─ orchestrator.py # simulate→process→train→reco
│ │ └─ __init__.py
│ ├─ reco/
│ │ ├─ collab.py # Surprise SVD CF
│ │ ├─ content.py # SentenceTransformers content‑based
│ │ └─ __init__.py
│ ├─ simulation/
│ │ ├─ data_generator.py # catalog + session simulator
│ │ └─ __init__.py
│ └─ utils/
│ ├─ config.py # Paths manager
│ ├─ logging_utils.py # Logger helper
│ └─ __init__.py
└─ requirements.txt
- End‑to‑end pipeline (simulate, process, train, recommend):
python scripts/run_pipeline.py- Streamlit app (first run will auto‑train if needed):
streamlit run app/streamlit_app.py- Data Simulation
- Build a catalog with brand/category/price/rating and simple text.
- Generate sessions: events, dwell times, segment/geo, and purchase outcomes with controllable conversion rate.
- Cleaning & EDA
- Basic sanitization, timestamp parsing, and summary stats by segment/geo.
- Feature Engineering
- Numerical features (events, dwell, price stats, tenure) and time features (hour, day‑of‑week) + one‑hot categorical encodings.
- Model Training
- XGBoost or LightGBM with Stratified K‑Fold; reports Out‑of‑Fold (OOF) AUC; saves final model.
- Recommendations
- Collaborative filtering (Surprise SVD) on implicit feedback derived from sessions.
- Content‑based using SentenceTransformers embeddings of product text for similar‑item suggestions.
- Serving (Demo)
- Streamlit app shows recent sessions for a user, intent scores, and a simple popularity‑weighted top‑K list.
- Classification: OOF AUC reported in console/logs (extend with ROC curves, calibration plots, feature importance, etc.).
- Recommendation: Precision@K/Recall@K scaffolding can be added by holding out some interactions and scoring the recommenders.
- The simulator is parameterized; tweak catalog size, sessions, conversion rate in
ECommerceSimulatorconfigs. - Swap in different intent models or hyperparameter search (Optuna/Sklearn GridSearch).
- Add image/text enrichment to catalog, or integrate LightFM for hybrid recommenders.
- For real‑time serving, wrap the trained model in a small API (Flask/FastAPI) and cache recent embeddings for speed.
This is a compact educational demo. The data is simulated and not representative of a real store’s nuances.