High-Stakes Fraud Detection System (End-to-End MLOps)

Owner: Ananya Shukla
Status: Active development (production-aware prototype)
Focus: Fraud detection as a decision system, not a standalone classifier

1. Project Motivation

Real-world fraud detection violates many assumptions of standard machine learning pipelines:

Extreme class imbalance (fraud ≪ 1%)
Delayed, noisy, and asymmetric labels (e.g., chargebacks)
Regulatory and audit requirements for explainability
Strict latency constraints for real-time decisions
Unequal and business-critical error costs (false positives vs. false negatives)

Most portfolio projects ignore these constraints and frame fraud as a static classification task.
This project intentionally does not.

The goal is to design and implement a production-aware fraud detection system that mirrors how real organizations deploy, monitor, govern, and retrain ML-driven risk decisions—while explicitly documenting what cannot be replicated without institutional access.

2. What This Project Is (and Is Not)

This IS:

An end-to-end fraud decision engine
A cost-sensitive, explainable ML system
A simulation of delayed-label reality and temporal leakage constraints
An MLOps-oriented system emphasizing lifecycle ownership
A system designed to be auditable, monitorable, and retrainable

This IS NOT:

A Kaggle-style notebook
A single “best model” benchmark
A purely real-time demo without decision logic
A claim of real production deployment or regulatory approval

The emphasis is on correct system design and trade-offs, not inflated performance metrics.

3. High-Level System Flow (Inverted Tree)

flowchart TB
    A[Incoming Transaction Stream]
    A --> B[Feature Pipeline<br/>Stateless & Versioned]
    B --> C[Fraud Model<br/>Cost-Sensitive]
    C --> D[Explainability Engine<br/>SHAP]
    C --> E[Risk Score]
    E --> F[Decision Policy Layer]
    F -->|Approve| G[Auto Approve]
    F -->|Step-Up| H[Additional Authentication]
    F -->|Block| I[Manual Review Queue]
    I --> J[Analyst Feedback]
    J --> K[Label Store<br/>Delayed Ground Truth]
    K --> L[Retraining Pipeline]
    L --> C

4. Core Design Principles

Accuracy is not a primary metric under extreme imbalance
Decisions are optimized for expected financial loss
Explainability is a first-class artifact
Time and label availability are modeled explicitly
Predictions must be auditable and reproducible
System realism is prioritized over algorithm novelty

5. Planned System Components

Each component is built incrementally and versioned independently.

Phase 1 — Data & Label Reality

IEEE-CIS dataset ingestion
Unified transaction event table
Simulated label-delay distribution
Time-aware train / validation / stream splits

Phase 2 — Feature Engineering

Rolling-window aggregates
Velocity and frequency features
Strict leakage prevention
Feature computation aligned with event time

Phase 3 — Modeling

Interpretable baseline (logistic regression)
Imputation-aware pipelines
Precision–Recall–centric evaluation
Explicit handling of delayed labels

Phase 4 — Decision Policy

Cost-based risk-to-action mapping
Threshold tuning under asymmetric costs
Separation of prediction and business decision logic

Phase 5 — Serving

FastAPI inference service
Production model loading via registry
Health and scoring endpoints
Request-level latency-safe inference

Phase 6 — Monitoring & Drift

Feature drift via Population Stability Index (PSI)
Prediction drift monitoring
Stream vs. train distribution comparisons

Phase 7 — Retraining

Mature-label data selection
Candidate vs. production model evaluation
Automated promotion via PR-AUC improvement
Model registry updates

Phase 8 — Governance & Audit

Request-level audit logging
Unique request IDs
Model artifact path + immutable SHA256 hash
Explainability linkage
Size-based audit log rotation

6. What Cannot Be Replicated (By Design)

This project intentionally does not claim to replicate:

Real chargeback or dispute pipelines
Legal responsibility or regulatory approval
Live customer friction costs
Production SLAs or on-call operations
Organization-specific fraud heuristics

These limitations are explicitly acknowledged, not ignored.

7. How to Navigate This Repository

The repository is organized to reflect a real-world ML system rather than a model-centric workflow.

fraud-detection-mlops/
├── docs/            # Design notes and diagrams
├── data/            # Ingestion, label delay simulation, time splits
├── features/        # Feature engineering pipelines
├── models/          # Training, evaluation, registry
├── decision/        # Cost-based decision policies
├── explainability/  # SHAP-based explanations
├── monitoring/      # Drift detection utilities
├── retraining/      # Retraining and promotion logic
├── api/             # FastAPI inference service
├── audit/           # Request-level audit logging
└── README.md

Development philosophy:

Structure precedes implementation
Each system phase is developed and committed independently
Design changes are preserved in Git history
No silent assumptions or hidden shortcuts

8. Tech Stack (Tentative)

Python
scikit-learn
SHAP
FastAPI
Joblib
Git + GitHub
VS Code

Specific libraries may evolve as the system matures; architectural intent will not.

9. Status Log

v1: System design, end-to-end MLOps pipeline, real-time inference, retraining, drift monitoring, and governance completed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High-Stakes Fraud Detection System (End-to-End MLOps)

1. Project Motivation

2. What This Project Is (and Is Not)

This IS:

This IS NOT:

3. High-Level System Flow (Inverted Tree)

4. Core Design Principles

5. Planned System Components

Phase 1 — Data & Label Reality

Phase 2 — Feature Engineering

Phase 3 — Modeling

Phase 4 — Decision Policy

Phase 5 — Serving

Phase 6 — Monitoring & Drift

Phase 7 — Retraining

Phase 8 — Governance & Audit

6. What Cannot Be Replicated (By Design)

7. How to Navigate This Repository

8. Tech Stack (Tentative)

9. Status Log

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
api		api
audit		audit
data		data
decision		decision
docs		docs
explainability		explainability
features		features
models		models
monitoring		monitoring
retraining		retraining
.gitignore		.gitignore
README.md		README.md

No-Detective/fraud-detection-mlops

Folders and files

Latest commit

History

Repository files navigation

High-Stakes Fraud Detection System (End-to-End MLOps)

1. Project Motivation

2. What This Project Is (and Is Not)

This IS:

This IS NOT:

3. High-Level System Flow (Inverted Tree)

4. Core Design Principles

5. Planned System Components

Phase 1 — Data & Label Reality

Phase 2 — Feature Engineering

Phase 3 — Modeling

Phase 4 — Decision Policy

Phase 5 — Serving

Phase 6 — Monitoring & Drift

Phase 7 — Retraining

Phase 8 — Governance & Audit

6. What Cannot Be Replicated (By Design)

7. How to Navigate This Repository

8. Tech Stack (Tentative)

9. Status Log

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages