DAFU is a comprehensive fraud detection and e-commerce analytics platform designed for enterprise deployment. Currently in active development, it provides advanced machine learning-based fraud detection capabilities with a focus on anomaly detection and sequence analysis.
DAFU is a fraud detection platform that combines multiple machine learning algorithms to provide comprehensive fraud detection and prevention solutions. The platform is built with modern technologies and follows enterprise best practices, with core ML capabilities fully implemented and enterprise features in development.
- π Unified CLI with API Integration: All-in-one command-line interface for authentication, logs, reports, products, and ML models
- π Authentication & User Management: JWT-based auth with role-based access control (RBAC)
- π Logging System: Structured logging with analytics and statistics
- π Report Management: Fraud detection report generation and tracking
- ποΈ Product Risk Management: E-commerce product management with fraud risk tracking
- π― Unified Model Interface: Single entry point for all fraud detection models
- π§ Advanced ML Algorithms: Isolation Forest and LSTM/GRU sequence models fully implemented
- π‘ Stream Processing: Real-time data stream processing with pre-trained models
- πΎ Model Persistence: Save and load trained models for production deployment
- π Dual Prediction Modes: Both batch and stream prediction capabilities
- π FastAPI Backend: Complete REST API with auth, logs, reports, products endpoints
- ποΈ Database Layer: PostgreSQL with SQLAlchemy ORM, complete schema
- π³ Docker Infrastructure: PostgreSQL containerization ready
- π§ͺπ Dual Learning Modes: Both supervised and unsupervised learning approaches
- ππ Comprehensive Analysis: 4-panel visualization with detailed performance metrics
- π Production-Ready Core: Complete fraud detection pipeline with evaluation
- π΅οΈββοΈπ― Flexible Detection: Classic and risk-score based detection methods
- π§Ή Data Processing: Automatic preprocessing with missing value handling
- β‘ Fast Startup: Lazy loading for instant model selection interface
-
β‘ Real-time API : Sub-50ms fraud scoring endpoints for ultra-low latency decisioning.
Enables the system to detect fraud instantly in live payment flows, ensuring compliance with real-time financial transaction requirements. -
π Enterprise Security : OAuth2, JWT, RBAC implementation.
Adds enterprise-grade authentication, token-based access, and role-based authorization to secure deployments in regulated environments. -
βΈοΈ Scalable Architecture : Kubernetes deployment with auto-scaling.
Provides seamless horizontal scaling based on traffic load, supporting both small-scale PoCs and large enterprise production clusters. -
π Advanced Monitoring : Prometheus, Grafana, Jaeger integration.
Full observability with metrics collection, real-time dashboards, and distributed tracing for faster issue detection and resolution. -
π¦ High-throughput Processing : 10,000+ TPS optimization.
Optimized to handle extremely high transaction volumes, scaling to 10,000+ transactions per second to meet the demands of major banks and payment providers.
- Quick Start
- Interactive CLI Guide π
- CLI Demo & Examples π¬
- Supported Data Formats
- Use Cases and Scenarios
- Complete Documentation - All documentation organized by category
- CLI Documentation - Interactive CLI guides
- Docker Documentation - Docker setup and deployment
- General Guides - Quick start and implementation guides
- Python 3.8+ (Python 3.9+ recommended)
- Docker (for PostgreSQL database)
- Git (for cloning the repository)
- 8GB+ RAM (for ML model training)
- 2GB+ free disk space (for models and data)
For API Features (NEW!):
- PostgreSQL (Docker container recommended)
- Port 8000 (for API server)
- Port 5432 (for PostgreSQL)
Optional (for production):
- Kubernetes cluster
- Redis (for caching)
The fastest way to get started with DAFU!
# Clone the repository
git clone https://github.com/MasterFabric/dafu.git
cd dafu
# Make CLI executable
chmod +x dafu
# Start interactive CLI
./dafu
# You'll see:
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# β ____ _ _____ _ _ β
# β | _ \ / \ | ___| | | | β
# β | | | |/ _ \ | |_ | | | | β
# β | |_| / ___ \| _| | |_| | β
# β |____/_/ \_\_| \___/ β
# β β
# β Data Analytics Functional Utilities - Interactive CLI β
# β Enterprise Fraud Detection & Analytics Platform β
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#
# Welcome to DAFU Interactive CLI!
# Type 'help' for available commands or 'exit' to quit
#
# dafu>
# Try different features
dafu> help # See all commands
dafu> auth login # Login to API (NEW!)
dafu> logs list # View system logs (NEW!)
dafu> reports list # View fraud reports (NEW!)
dafu> products stats # Product statistics (NEW!)
dafu> fraud-detection # Run ML models
dafu> docker status # Check Docker services
# The CLI will:
# β Auto-create virtual environment if needed
# β Auto-install dependencies
# β Manage authentication sessions
# β Provide unified access to all features
# β Return to CLI prompt after each commandAvailable CLI Commands:
Key Features:
- β
API Integration - Full authentication, logs, reports, products management
- β Persistent Session - Login once, use everywhere with session management
- β Auto-Setup - Automatically creates virtual environment and installs dependencies
- β Error Resilient - CLI stays active even when commands fail
- β User-Friendly - Color-coded output and helpful messages
- β Scriptable - Use in automation with single command mode
- β
Role-Based Access - Support for viewer, user, analyst, admin roles
π Documentation:
- Complete Usage Guide - Full platform usage
- CLI Guide - Interactive CLI reference
- API Guide - REST API documentation
- All Documentation - Complete documentation library
Complete platform with authentication, logging, reports, and product management
Step 1: Start PostgreSQL
docker run -d --name dafu-postgres \
-e POSTGRES_USER=dafu \
-e POSTGRES_PASSWORD=dafu_secure_password \
-e POSTGRES_DB=dafu \
-p 5432:5432 \
postgres:15-alpineStep 2: Start API Server (in separate terminal)
cd dafu/core/features/fraud_detection
./start_api.sh
# Wait for:
# INFO: Uvicorn running on http://0.0.0.0:8000
# INFO: Application startup complete.Step 3: Use DAFU CLI
./dafu
dafu> auth register # First time: register user
dafu> auth login # Login with credentials
dafu> auth whoami # Check your user info
dafu> logs list # View system logs
dafu> reports list # View fraud reports
dafu> products stats # Product statistics
dafu> fraud-detection # Run ML modelsπ Complete Guide: See docs/USAGE_GUIDE.md for detailed instructions
Features Available:
- β JWT authentication with RBAC
- β System logging and analytics
- β Fraud detection report generation
- β Product risk management
- β All ML models
- β RESTful API endpoints
- β Database persistence
- β Session management
API Documentation: http://localhost:8000/docs (Swagger UI)
Step 1: Clone and Setup Environment
# Clone the repository
git clone https://github.com/MasterFabric/dafu.git
cd dafu
# Create virtual environment
python3 -m venv dafu_env
source dafu_env/bin/activate # On Windows: dafu_env\Scripts\activate
# Expected output:
# (dafu_env) masterfabric@machine:dafu$ Step 2: Install Dependencies
# Navigate to fraud detection module
cd core/features/fraud_detection
# Install minimal dependencies (recommended for first-time users)
pip install -r requirements-minimal.txt
# Expected output:
# Collecting numpy>=1.21.0
# Downloading numpy-1.24.3-cp39-cp39-macosx_10_9_x86_64.whl (20.1 MB)
# ββββββββββββββββββββββββββββββ 20.1/20.1 MB 2.1 MB/s eta 0:00:00
# Collecting pandas>=1.3.0
# Downloading pandas-1.5.3-cp39-cp39-macosx_10_9_x86_64.whl (11.3 MB)
# ββββββββββββββββββββββββββββββ 11.3/11.3 MB 2.8 MB/s eta 0:00:00
# ...
# Successfully installed numpy-1.24.3 pandas-1.5.3 scikit-learn-1.3.0 ...Step 3: Verify Installation
# Test the installation
python -c "from src.models.anomaly_detection import IsolationForestFraudDetector; print('β
Installation successful!')"
# Expected output:
# β
Installation successful!Step 4: Run Unified Model Interface
# Run the unified model selection interface
cd core/features/fraud_detection/src/models
python main.py
# Expected terminal interaction:
# ========================================
# π ENTERPRISE FRAUD DETECTION PLATFORM
# ========================================
# Advanced Machine Learning Models for Fraud Detection
# Version: 1.0.0
# ========================================
#
# This platform offers multiple fraud detection approaches:
# β’ Traditional ML: Isolation Forest with Risk Score analysis
# β’ Deep Learning: LSTM and GRU sequence-based models
# β’ Both supervised and unsupervised learning modes
# β’ Real-time streaming and batch processing capabilities
# ========================================
#
# β‘ Fast startup - models load only when selected!
#
# ============================================================
# π― SELECT FRAUD DETECTION MODEL
# ============================================================
# Choose the type of fraud detection model you want to use:
#
# 1. π ISOLATION FOREST & RISK SCORE
# β’ Traditional machine learning approach
# β’ Excellent for tabular data with numerical features
# β’ Supports both supervised and unsupervised learning
# β’ Risk score based anomaly detection
# β’ Fast training and prediction
#
# 2. π§ SEQUENCE MODELS (LSTM & GRU)
# β’ Deep learning approach for sequential data
# β’ Captures temporal patterns and dependencies
# β’ Autoencoder architecture for anomaly detection
# β’ Best for time-series and transaction sequences
# β’ More complex but potentially more accurate
#
# 3. βΉοΈ MODEL COMPARISON
# β’ Compare different models on the same dataset
# β’ Get recommendations based on your data
#
# 4. β HELP & INFORMATION
# β’ Detailed information about each model
# β’ Data requirements and recommendations
#
# 5. πͺ EXIT
# β’ Exit the application
# ============================================================
#
# Enter your choice (1-5): Alternative: Run Individual Model Tests
# Run individual model tests (legacy method)
cd core/features/fraud_detection
python test_anomaly_detection.py
python test_sequence_models_interactive.pyStatus: Infrastructure prepared, services not integrated yet
What's Ready:
- β Docker configuration files
- β Database schemas
- β Service definitions
β οΈ ML models NOT integrated with API yet
Current Limitation: Docker Compose services are commented out until API-ML integration is complete. For now, use Option 1 (Local Development) to run ML models.
Future Setup (when ready):
# Clone and navigate
git clone https://github.com/MasterFabric/dafu.git
cd dafu
# Uncomment services in docker-compose.yml
# Then start services
docker-compose up -dWhy Services Are Commented Out:
The ML models (Isolation Forest, LSTM/GRU) work perfectly standalone, but the FastAPI endpoints need ML integration. All infrastructure (database schemas, service configs, monitoring) is prepared and ready to be activated once the integration is complete.
What You Can Do Now:
- β Use all ML models via Python (Option 1)
- β Train and save models
- β Stream and batch processing
- β See Docker Status for integration roadmap
Next Step:
Integrate ML models with FastAPI, then uncomment services in docker-compose.yml.
For testing individual components:
# Build the fraud detection service
cd core/features/fraud_detection
docker build -f deployment/Dockerfile -t dafu-fraud-detection .
# Run with sample data
docker run -it --rm \
-v $(pwd)/sample_fraud_data.csv:/app/data.csv \
dafu-fraud-detection \
python test_anomaly_detection.pyStep 1: Deploy with Helm
# Deploy using Helm (when API is ready)
cd core/features/fraud_detection/deployment
helm install dafu-fraud-detection ./helm-charts/ \
--set image.tag=latest \
--set replicas=3 \
--set resources.requests.memory=512Mi
# Expected output:
# NAME: dafu-fraud-detection
# LAST DEPLOYED: Mon Jan 15 10:30:00 2024
# NAMESPACE: default
# STATUS: deployed
# REVISION: 1
# TEST SUITE: NoneStep 2: Verify Deployment
# Check pod status
kubectl get pods -l app=dafu-fraud-detection
# Expected output:
# NAME READY STATUS RESTARTS AGE
# dafu-fraud-detection-7d4b8c9f-abc 1/1 Running 0 2m
# dafu-fraud-detection-7d4b8c9f-def 1/1 Running 0 2m
# dafu-fraud-detection-7d4b8c9f-ghi 1/1 Running 0 2m1. Run the Unified Model Interface
cd core/features/fraud_detection/src/models
python main.pyExpected Terminal Interface:
π ENTERPRISE FRAUD DETECTION PLATFORM
========================================
Advanced Machine Learning Models for Fraud Detection
Version: 1.0.0
========================================
This platform offers multiple fraud detection approaches:
β’ Traditional ML: Isolation Forest with Risk Score analysis
β’ Deep Learning: LSTM and GRU sequence-based models
β’ Both supervised and unsupervised learning modes
β’ Real-time streaming and batch processing capabilities
========================================
β‘ Fast startup - models load only when selected!
============================================================
π― SELECT FRAUD DETECTION MODEL
============================================================
Choose the type of fraud detection model you want to use:
1. π ISOLATION FOREST & RISK SCORE
2. π§ SEQUENCE MODELS (LSTM & GRU)
3. βΉοΈ MODEL COMPARISON
4. β HELP & INFORMATION
5. πͺ EXIT
Enter your choice (1-5):
2. Select Your Model
Choose option 1 for Isolation Forest or option 2 for Sequence Models. The system will:
- Load the selected model (with progress indicator)
- Guide you through configuration
- Handle all setup automatically
3. View Results
After completion, you'll see comprehensive results with visualizations and exported data.
The new unified interface provides:
- Single Entry Point: One command to access all fraud detection models
- Smart Model Selection: Interactive guidance for choosing the right model
- Fast Startup: Lazy loading ensures instant interface response
- Model Comparison: Built-in comparison tools and recommendations
- Help System: Comprehensive information and decision trees
- Seamless Navigation: Easy switching between models and options
1. Train a Model First
python test_sequence_models_interactive.pyExpected Questions:
π― Prediction Mode Selection:
1. Batch Prediction (train and evaluate)
2. Stream Prediction (use pre-trained model)
Please select prediction mode (1 or 2): 1
π― Learning Mode Selection:
1. Supervised Learning
2. Unsupervised Learning
Please select learning mode (1 or 2): 1
π― Model Selection:
Available models: ['LSTM', 'GRU', 'Both']
Please select models (comma-separated): LSTM,GRU
2. Test Stream Processing
# Run stream prediction with pre-trained model
python test_sequence_models_interactive.pySelect Stream Mode:
π― Prediction Mode Selection:
1. Batch Prediction (train and evaluate)
2. Stream Prediction (use pre-trained model)
Please select prediction mode (1 or 2): 2
π Model Package Selection:
Available models: ['my_fraud_model', 'production_model']
Please select model: my_fraud_model
β
Model loaded successfully!
π Processing stream data...
β
Stream processing complete! Results saved to: stream_results/
1. One-Command Demo
# Run complete demo in Docker
docker run -it --rm \
-v $(pwd)/sample_fraud_data.csv:/app/data.csv \
-v $(pwd)/results:/app/results \
dafu-fraud-detection \
python test_anomaly_detection.pyExpected Output:
π DAFU Fraud Detection System - Docker Demo
========================================
π Data Analysis Results:
- Dataset shape: (1000, 8)
- Missing values: 0
- Fraud rate: 5.0%
π― Running unsupervised anomaly detection...
β
Analysis complete! Results saved to: /app/results/
Common Issues and Solutions:
Issue 1: Import Error
ModuleNotFoundError: No module named 'src.models.anomaly_detection'Solution:
# Make sure you're in the fraud_detection directory
cd fraud_detection
python -c "from src.models.anomaly_detection import IsolationForestFraudDetector; print('β
Fixed!')"Issue 2: Memory Error
MemoryError: Unable to allocate arraySolution:
# Use smaller dataset or reduce model complexity
export PYTHONHASHSEED=0
python test_anomaly_detection.pyIssue 3: Docker Build Fails
ERROR: failed to solve: failed to resolve sourceSolution:
# Clean Docker cache and rebuild
docker system prune -f
docker build --no-cache -f deployment/Dockerfile -t dafu-fraud-detection .System Requirements:
- Minimum: 4GB RAM, 2 CPU cores
- Recommended: 8GB RAM, 4 CPU cores
- Production: 16GB+ RAM, 8+ CPU cores
Processing Times:
- Small dataset (1K records): 10-30 seconds
- Medium dataset (10K records): 2-5 minutes
- Large dataset (100K records): 10-20 minutes
- Stream processing: <1 second per record
Memory Usage:
- Training: 2-4GB RAM
- Prediction: 500MB-1GB RAM
- Stream mode: 200-500MB RAM
# Start the unified interface
cd core/features/fraud_detection/src/models
python main.py
# Follow the interactive prompts:
# 1. Choose your model (Isolation Forest or Sequence Models)
# 2. Select prediction mode (Batch or Stream)
# 3. Configure parameters
# 4. Run analysisfrom fraud_detection.src.models.anomaly_detection import IsolationForestFraudDetector
# Initialize the detector
detector = IsolationForestFraudDetector(random_state=42)
# Load and analyze your data
detector.load_and_analyze_data('transaction_data.csv')
# Setup learning mode (supervised/unsupervised)
detector.setup_learning_mode()
# Choose detection method
# - Classic: Binary classification with contamination levels
# - Risk Score: Custom threshold-based detection
# Preprocess data
detector.preprocess_data()
# Train models
detector.train_models([0.01, 0.05, 0.1]) # Multiple contamination levels
# Evaluate and visualize
if detector.is_supervised:
detector.evaluate_models()
detector.create_visualizations(save_plots=True)
detector.export_results('fraud_analysis_results')# Run comprehensive anomaly detection tests
cd core/features/fraud_detection
python test_anomaly_detection.py
# Run sequence model tests
python test_sequence_models_interactive.pyNote: The unified interface (main.py) is now the recommended way to access all fraud detection capabilities. Individual model tests are still available for advanced users.
from fraud_detection.src.models.sequence_models import SequenceFraudDetector
# Initialize sequence detector
sequence_detector = SequenceFraudDetector()
# Setup prediction mode (NEW!)
sequence_detector.setup_prediction_mode()
# Choose: 1. Batch Prediction or 2. Stream Prediction
# For Batch Prediction Mode
if sequence_detector.prediction_mode == 'batch':
# Load and analyze data
sequence_detector.load_and_analyze_data('user_sequences.csv')
# Setup learning mode
sequence_detector.setup_learning_mode()
# Preprocess data
sequence_detector.preprocess_data()
# Train models
sequence_detector.train_models(['LSTM', 'GRU'])
# Save trained models (NEW!)
sequence_detector.save_model_package('my_fraud_model')
# Evaluate and export
sequence_detector.evaluate_models()
sequence_detector.export_results('batch_results')
# For Stream Prediction Mode (NEW!)
elif sequence_detector.prediction_mode == 'stream':
# Load pre-trained model
sequence_detector.load_model_package('my_fraud_model')
# Load new stream data
stream_data = pd.read_csv('new_stream_data.csv')
# Preprocess stream data
processed_stream = sequence_detector.preprocess_stream_data(stream_data)
# Make predictions on stream
predictions = sequence_detector.predict_stream(processed_stream)
# Export stream results (NEW!)
sequence_detector.export_stream_results(stream_data, predictions)from fraud_detection.src.models.anomaly_detection import IsolationForestFraudDetector
# Initialize detector
detector = IsolationForestFraudDetector()
# Setup prediction mode (NEW!)
detector.setup_prediction_mode()
# Choose: 1. Batch Prediction or 2. Stream Prediction
# For Stream Prediction Mode
if detector.prediction_mode == 'stream':
# Load pre-trained model
detector.load_model_package('trained_fraud_model')
# Load new stream data
stream_data = pd.read_csv('new_transactions.csv')
# Preprocess stream data
processed_stream = detector.preprocess_stream_data(stream_data)
# Make predictions on stream
results = detector.predict_stream(processed_stream, contamination=0.1)
# Export stream results (NEW!)
detector.export_stream_results(stream_data, results)transaction_id,user_id,amount,merchant_id,timestamp,category,is_fraud
tx_001,user_123,150.00,merchant_456,2024-01-15 10:30:00,electronics,0
tx_002,user_124,2500.00,merchant_789,2024-01-15 11:45:00,jewelry,1
user_id,timestamp,action_type,device_id,location,amount
user_123,2024-01-15 10:30:00,login,mobile_device_001,location_A,0
user_123,2024-01-15 10:31:00,purchase,mobile_device_001,location_A,150.00
timestamp,user_id,transaction_count,daily_amount,risk_score
2024-01-15,user_123,5,750.00,0.2
2024-01-16,user_123,8,1200.00,0.4
Scenario: Choose the right fraud detection model for your data Solution: Interactive model selection interface
# Start the unified interface
cd core/features/fraud_detection/src/models
python main.py
# Interactive model selection:
# 1. π ISOLATION FOREST & RISK SCORE - For tabular data
# 2. π§ SEQUENCE MODELS (LSTM & GRU) - For sequential data
# 3. βΉοΈ MODEL COMPARISON - Compare different approaches
# 4. β HELP & INFORMATION - Get detailed guidance
# 5. πͺ EXIT - Exit the applicationScenario: Detect fraudulent transactions in real-time during checkout Solution: Risk Score API with sub-50ms response time
# Real-time scoring
response = requests.post('https://api.masterfabric.co/dafu/v1/score', json={
'transaction_id': 'tx_123',
'amount': 150.00,
'user_id': 'user_456',
'merchant_id': 'merchant_789',
'device_fingerprint': 'fp_abc123',
'ip_address': '192.168.1.1',
'user_agent': 'Mozilla/5.0...'
})
fraud_score = response.json()['risk_score']
is_fraud = fraud_score > 0.7 # Custom thresholdScenario: Analyze historical data for fraud patterns and model retraining Solution: Batch Processing API with large-scale data handling
# Batch analysis
batch_request = {
'data_source': 's3://fraud-data/transactions_2024.csv',
'analysis_type': 'comprehensive',
'models': ['isolation_forest', 'lstm', 'xgboost'],
'output_format': 'detailed_report'
}
response = requests.post('https://api.masterfabric.co/dafu/v1/batch/analyze', json=batch_request)Scenario: Detect anomalous user behavior patterns over time Solution: Sequence models with LSTM/GRU for temporal pattern recognition
# User behavior analysis
from fraud_detection.src.models.sequence_models import UserBehaviorAnalyzer
analyzer = UserBehaviorAnalyzer()
analyzer.load_user_sequences('user_behavior_data.csv')
# Detect anomalies in user patterns
anomalies = analyzer.detect_behavioral_anomalies(
sequence_length=30,
threshold=0.8
)Scenario: Evaluate merchant risk profiles for payment processing Solution: Multi-model ensemble with business rules
# Merchant risk assessment
from fraud_detection.src.rules_engine.rule_processor import MerchantRiskProcessor
processor = MerchantRiskProcessor()
merchant_risk = processor.assess_merchant_risk(
merchant_id='merchant_123',
transaction_history='merchant_transactions.csv',
risk_factors=['chargeback_rate', 'transaction_patterns', 'location_anomalies']
)Scenario: Process incoming transactions in real-time using pre-trained models Solution: Stream prediction mode with model persistence
# Real-time stream processing
from fraud_detection.src.models.anomaly_detection import IsolationForestFraudDetector
detector = IsolationForestFraudDetector()
detector.setup_prediction_mode()
# Choose: 2. Stream Prediction
# Load pre-trained model
detector.load_model_package('production_fraud_model')
# Process incoming transactions
while True:
# Get new transaction from stream
new_transaction = get_next_transaction()
# Preprocess and predict
processed = detector.preprocess_stream_data(new_transaction)
result = detector.predict_stream(processed, contamination=0.1)
# Take action based on prediction
if result['predictions'][0] == 1:
block_transaction(new_transaction)
else:
approve_transaction(new_transaction)Scenario: Train models on historical data and deploy for production use Solution: Batch training with model persistence
# Training and deployment pipeline
from fraud_detection.src.models.sequence_models import SequenceFraudDetector
# Phase 1: Train models
trainer = SequenceFraudDetector()
trainer.setup_prediction_mode()
# Choose: 1. Batch Prediction
trainer.load_and_analyze_data('historical_fraud_data.csv')
trainer.setup_learning_mode()
trainer.preprocess_data()
trainer.train_models(['LSTM', 'GRU'])
# Save trained models for production
trainer.save_model_package('production_lstm_model')
print("β
Models trained and saved for production use")
# Phase 2: Deploy for stream processing
deployer = SequenceFraudDetector()
deployer.setup_prediction_mode()
# Choose: 2. Stream Prediction
deployer.load_model_package('production_lstm_model')
print("β
Models loaded and ready for stream processing")Scenario: Detect fraud rings and coordinated attacks Solution: Network features with graph analysis
# Network fraud detection
from fraud_detection.src.feature_engineering.network_features import NetworkAnalyzer
analyzer = NetworkAnalyzer()
fraud_rings = analyzer.detect_fraud_networks(
transaction_data='network_transactions.csv',
similarity_threshold=0.8,
min_ring_size=3
)The main fraud detection microservice with end-to-end ML capabilities, exposed via FastAPI for real-time and batch use cases.
- Anomaly Detection: Isolation Forest for unsupervised fraud detection and anomaly scoring.
- Sequence Models: LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) models for temporal/behavioral pattern recognition.
- Ensemble Methods: XGBoost and Random Forest for robust, tree-based predictions.
- Neural Networks: Deep learning models (via TensorFlow/Keras) for complex nonlinear fraud patterns.
- Rules Engine: Extensible business rules for configurable thresholds, velocity checks, and custom scoring strategies.
A modular preprocessing and feature extraction pipeline designed for real-time and offline analytics.
- Transaction Features: Amount distributions, frequency of transactions, merchant category profiling.
- User Features: Historical behavioral patterns, device fingerprints, account age/risk indicators.
- Network Features: Graph-based entity relationships (shared IPs, merchants, accounts).
- Temporal Features: Time-series analysis (sliding windows, session duration, peak-time anomalies).
Enterprise-grade RESTful APIs providing low-latency endpoints for real-time scoring and large-scale data ingestion.
All APIs are implemented using FastAPI, leveraging OpenAPI/Swagger for documentation and schema validation.
-
Fraud Scoring API: Real-time fraud detection endpoint.
- Built on FastAPI for async performance.
- Supports REST and optionally gRPC for low-latency scenarios.
- Designed for sub-50ms response times with Redis caching.
-
Batch Processing API: Bulk scoring and data ingestion.
- Optimized for large datasets with Dask / Apache Spark integration.
- Used for offline analysis, backfills, reporting, and model monitoring.
- Supports scheduled jobs (via Apache Airflow).
-
Model Management API: Centralized model lifecycle control.
A cloud-native, microservices-based foundation, optimized for scalability and observability.
- Containerization: Docker multi-stage builds for lightweight, reproducible services (Podman is another option).
- Orchestration: Kubernetes with Helm charts for deployment, scaling, and service discovery.
- Monitoring: Prometheus (metrics), Grafana (dashboards), Jaeger (distributed tracing).
- Security: OAuth2 / JWT authentication, RBAC policies, and API key management for multi-tenant enterprise compliance.
# .env file
DATABASE_URL=postgresql://user:password@localhost:5432/dafu
REDIS_URL=redis://localhost:6379/0
MODEL_STORAGE_PATH=/models
LOG_LEVEL=INFO
API_RATE_LIMIT=1000
FRAUD_THRESHOLD=0.7# config/models.json
{
"isolation_forest": {
"contamination": [0.01, 0.05, 0.1],
"n_estimators": 100,
"random_state": 42
},
"lstm": {
"sequence_length": 10,
"hidden_units": 64,
"dropout": 0.2,
"epochs": 50
},
"xgboost": {
"n_estimators": 100,
"max_depth": 6,
"learning_rate": 0.1
}
}# config/rules.json
{
"amount_threshold": {
"condition": "amount > 10000",
"risk_score": 0.8,
"action": "flag_for_review"
},
"velocity_check": {
"condition": "transactions_per_hour > 10",
"risk_score": 0.6,
"action": "additional_verification"
},
"location_anomaly": {
"condition": "distance_from_home > 1000km",
"risk_score": 0.5,
"action": "location_verification"
}
}- Latency: <50ms for real-time scoring
- Throughput: 10,000+ transactions per second
- Accuracy: 95%+ fraud detection accuracy
- Availability: 99.9% uptime SLA
# Kubernetes HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: dafu-fraud-detection
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: dafu-fraud-detection
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70# Redis caching for model predictions
import redis
from functools import wraps
redis_client = redis.Redis(host='localhost', port=6379, db=0)
def cache_prediction(expiration=3600):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
cache_key = f"prediction:{hash(str(args) + str(kwargs))}"
cached_result = redis_client.get(cache_key)
if cached_result:
return json.loads(cached_result)
result = func(*args, **kwargs)
redis_client.setex(cache_key, expiration, json.dumps(result))
return result
return wrapper
return decorator- Authentication: OAuth2/JWT token-based authentication
- Authorization: Role-based access control (RBAC)
- Data Encryption: TLS 1.3 for data in transit, AES-256 for data at rest
- Input Validation: Comprehensive data validation and sanitization
- Audit Logging: Complete audit trail for compliance
- GDPR: Data processing transparency and user rights
- PCI DSS: Secure payment card data handling
- SOC 2: Security controls and monitoring
- ISO 27001: Information security management
# Security middleware
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.trustedhost import TrustedHostMiddleware
app = FastAPI()
# CORS configuration
app.add_middleware(
CORSMiddleware,
allow_origins=["https://dafu.masterfabric.co"],
allow_credentials=True,
allow_methods=["GET", "POST"],
allow_headers=["*"],
)
# Trusted hosts
app.add_middleware(
TrustedHostMiddleware,
allowed_hosts=["dafu.masterfabric.co", "*.masterfabric.co"]
)# Run all tests
cd fraud_detection
pytest tests/ -v --cov=src --cov-report=html
# Run specific test categories
pytest tests/test_anomaly_detection.py -v
pytest tests/test_api_endpoints.py -v
pytest tests/test_feature_engineering.py -v
# Performance testing
pytest tests/test_performance.py -v --benchmark-only# Linting and formatting
black src/ tests/
flake8 src/ tests/
mypy src/
pylint src/
# Pre-commit hooks
pre-commit install
pre-commit run --all-files# Prometheus metrics
from prometheus_client import Counter, Histogram, Gauge
fraud_predictions_total = Counter('fraud_predictions_total', 'Total fraud predictions', ['model', 'result'])
prediction_latency = Histogram('prediction_latency_seconds', 'Prediction latency')
active_models = Gauge('active_models_total', 'Number of active models')# Structured logging
import structlog
logger = structlog.get_logger()
# Usage
logger.info(
"fraud_prediction_completed",
transaction_id="tx_123",
model="isolation_forest",
risk_score=0.85,
processing_time_ms=45
)- Fraud Detection Metrics: Prediction accuracy, latency, throughput
- Model Performance: Model accuracy, drift detection, retraining triggers
- System Health: CPU, memory, disk usage, API response times
- Business Metrics: Fraud rates, false positives, cost analysis
Best for: Development, testing, ML model training
# Clone repository
git clone https://github.com/MasterFabric/dafu.git
cd dafu/core/features/fraud_detection
# Setup environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run ML models
cd src/models
python main.py # Interactive model selectionFeatures Available:
- β All ML models (Isolation Forest, LSTM, GRU)
- β Training and prediction
- β Stream/batch processing
- β Model persistence
- β Visualization and export
Status: Configuration complete, services commented out until API-ML integration
The complete Docker Compose setup is prepared in docker-compose.yml but all services are currently commented out. See Docker Status for details.
What's Prepared:
- Complete service definitions (API, PostgreSQL, Redis, RabbitMQ, Celery, Prometheus, Grafana)
- Database schemas
- Network and volume configuration
- Health checks and monitoring
When Active (after integration):
# Uncomment services in docker-compose.yml
docker-compose up -d# Deploy to Kubernetes
kubectl apply -f core/features/fraud_detection/deployment/k8s-manifests/
# Or using Helm
helm install dafu ./core/features/fraud_detection/deployment/helm-charts/ \
--set image.tag=latest \
--set replicas=3 \
--set resources.requests.memory=512Mi \
--set resources.requests.cpu=250m# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Run tests
run: |
cd core/features/fraud_detection
pip install -r requirements.txt
pytest tests/
deploy:
needs: test
runs-on: ubuntu-latest
steps:
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/dafu-fraud-detection \
fraud-detection=masterfabric/dafu:latest- Fork the repository
- Create a feature branch
git checkout -b feature/your-feature-name
- Install development dependencies
cd core/features/fraud_detection pip install -r requirements.txt pip install -r requirements-dev.txt # If available
- Run tests and linting
pytest tests/ black src/ tests/ flake8 src/ tests/
- Submit a pull request
- Python Style: PEP 8 compliance with Black formatting
- Type Hints: Comprehensive type annotations
- Documentation: Google-style docstrings
- Testing: 90%+ test coverage required
- ASCII Only: No non-ASCII characters in code (enforced by pre-commit hooks)
Follow the conventional commit format:
type(scope): description
[optional body]
[optional footer]
Types: feat, fix, docs, style, refactor, test, chore
Real-time fraud scoring endpoint.
Request:
{
"transaction_id": "tx_123",
"amount": 150.00,
"user_id": "user_456",
"merchant_id": "merchant_789",
"timestamp": "2024-01-15T10:30:00Z",
"device_fingerprint": "fp_abc123",
"ip_address": "192.168.1.1"
}Response:
{
"transaction_id": "tx_123",
"risk_score": 0.85,
"is_fraud": true,
"model_used": "isolation_forest",
"processing_time_ms": 45,
"confidence": 0.92,
"explanations": {
"amount_risk": 0.3,
"user_behavior_risk": 0.7,
"merchant_risk": 0.2
}
}Batch fraud analysis endpoint.
Request:
{
"data_source": "s3://bucket/data.csv",
"analysis_type": "comprehensive",
"models": ["isolation_forest", "lstm", "xgboost"],
"output_format": "detailed_report"
}List available models.
Deploy a new model version.
Get model performance metrics.
A complete Postman collection is available for testing all API endpoints:
π¦ DAFU_API.postman_collection.json
What's Included:
| Category | Endpoints | Description |
|---|---|---|
| 1. Authentication | 7 endpoints | Register, Login, Logout, Token refresh, Password change, API keys |
| 2. Log Management | 6 endpoints | CRUD operations, Statistics, Filtering |
| 3. Report Management | 6 endpoints | Report generation, Tracking, Statistics |
| 4. Product Management | 7 endpoints | Product CRUD, High-risk detection, Statistics |
| 5. Health & System | 3 endpoints | Health check, API info, OpenAPI schema |
Features:
- β Auto-save tokens: Login automatically saves access_token to environment
- β Complete examples: All requests include sample data
- β Test scripts: Automated token management
- β Documentation: Each endpoint documented with descriptions
- β Environment variables: Pre-configured base_url and tokens
How to Use:
-
Import into Postman
# Option 1: Import file directly File β Import β Select DAFU_API.postman_collection.json # Option 2: Import from URL (if hosted) File β Import β Link β Paste collection URL
-
Create Environment (Optional but recommended)
Environment Name: DAFU Local Variables: - base_url: http://localhost:8000 - access_token: (will be set automatically after login) - refresh_token: (will be set automatically after login) -
Start API Server
cd core/features/fraud_detection ./start_api.sh -
Test Workflow
Step 1: Health Check β Verify API is running Step 2: Register β Create new user account Step 3: Login β Get access token (auto-saved) Step 4: Try any endpoint β Use authenticated requests
Quick Start with Postman:
- Start API:
./start_api.sh - Import collection:
DAFU_API.postman_collection.json - Run "Register New User" β Create account
- Run "Login" β Token saved automatically β
- Try any authenticated endpoint!
Alternative: Swagger UI
If you prefer browser-based testing:
- Start API server
- Open http://localhost:8000/docs
- Interactive API documentation with "Try it out" buttons
| Feature | Description | Status | Implementation Level |
|---|---|---|---|
| Unified Model Interface | Single entry point for all models | β NEW! Fully Implemented | Complete with interactive selection |
| Isolation Forest Detection | Core anomaly detection algorithm | β Fully Implemented | Complete with evaluation & visualization |
| Sequence Models (LSTM/GRU) | Time-series fraud detection | β Fully Implemented | Complete with TensorFlow implementation |
| Stream Prediction Mode | Real-time data stream processing | β NEW! Fully Implemented | Complete with model persistence |
| Batch Prediction Mode | Batch data processing | β NEW! Fully Implemented | Complete with training & prediction |
| Model Persistence | Save/load trained models | β NEW! Fully Implemented | Complete with .joblib & .h5 support |
| Data Preprocessing | Automatic data analysis & feature engineering | β Fully Implemented | Complete with missing value handling |
| Supervised/Unsupervised Modes | Dual learning approaches | β Fully Implemented | Complete with mode selection |
| Risk Score Detection | Custom threshold-based detection | β Fully Implemented | Complete with business interpretation |
| Comprehensive Evaluation | Performance metrics & visualization | β Fully Implemented | Complete with 4-panel analysis |
| Enhanced Result Export | CSV, JSON output with stream support | β Enhanced | Complete with stream & batch exports |
| Docker Infrastructure | Docker Compose configuration | π Prepared | All services configured, not integrated yet |
| FastAPI Basic Structure | REST API framework | π Prepared | Basic endpoints exist, ML integration pending |
| Database Schema | PostgreSQL schema design | π Prepared | Complete schema ready, not connected yet |
| Docker Support | Containerization | β Fully Implemented | Dockerfile with multi-stage build |
| Fast Startup Interface | Lazy loading for instant response | β NEW! Fully Implemented | Complete with optimized imports |
| Feature | Description | Status | Implementation Level |
|---|---|---|---|
| API-ML Integration | Connect ML models to FastAPI | π§ Next Priority | API structure ready, needs ML integration |
| Database Integration | PostgreSQL connection | π§ In Development | Schema ready, ORM integration pending |
| Redis Caching | Performance optimization | π§ In Development | Config ready, not implemented |
| Celery Tasks | Background job processing | π§ In Development | Not implemented yet |
| Feature Engineering Pipeline | Advanced feature extraction | π§ Basic Structure | Framework exists, needs implementation |
| Rules Engine | Business rule processing | π§ Basic Structure | Framework exists, needs implementation |
| Ensemble Models | XGBoost, Random Forest | π§ Basic Structure | Framework exists, needs implementation |
| Feature | Description | Status | Target Timeline |
|---|---|---|---|
| Real-time API | Sub-50ms fraud scoring API | π Planned | In Development |
| Enterprise Security | OAuth2, JWT, RBAC | π Planned | In Development |
| Monitoring & Observability | Prometheus, Grafana, Jaeger | π Planned | In Development |
| Auto-scaling | Kubernetes HPA | π Planned | In Development |
| Advanced Analytics | Graph-based fraud detection | π Planned | In Development |
| Model Management | Versioning, A/B testing | π Planned | In Development |
| Compliance Features | GDPR, PCI DSS compliance | π Planned | In Development |
| High-throughput Processing | 10,000+ TPS optimization | π Planned | In Development |
- Accuracy: 90%+ fraud detection accuracy (based on test results)
- Model Training: Complete end-to-end pipeline
- Data Processing: Handles large datasets efficiently
- Visualization: Comprehensive 4-panel analysis plots
- Export Capability: Structured results with timestamps
- Latency: <50ms for real-time scoring (planned)
- Throughput: 10,000+ transactions per second (planned)
- Availability: 99.9% uptime SLA (planned)
- Security: Zero data breaches (planned)
- Real-time API Implementation: Complete FastAPI endpoints for fraud scoring
- Authentication & Authorization: OAuth2/JWT implementation
- Input Validation: Comprehensive request validation
- API Documentation: OpenAPI/Swagger documentation
- Basic Security: HTTPS, CORS, rate limiting
- Kubernetes Production Deployment: Full K8s manifests and Helm charts
- Monitoring & Observability: Prometheus, Grafana, Jaeger integration
- Auto-scaling: Kubernetes HPA with custom metrics
- Message Queuing: RabbitMQ/Celery for async processing
- Advanced Feature Engineering: Complete pipeline implementation
- Model Management: Versioning, A/B testing, model registry
- Ensemble Methods: XGBoost, Random Forest implementation
- Graph-based Detection: Network analysis for fraud rings
- Business Rules Engine: Complete rule processing system
- Advanced Analytics: Dashboard and reporting system
- High-throughput Optimization: 10,000+ TPS processing
- Performance Tuning: Memory optimization, caching strategies
- Compliance Features: GDPR, PCI DSS compliance tools
- Machine Learning Pipeline: Automated model training and deployment
- Multi-tenant Architecture: Enterprise multi-tenancy support
- Documentation: Comprehensive guides and API documentation
- GitHub Issues: Bug reports and feature requests
- Community Forum: Discussions and Q&A
- Feedback & Support: dafu@masterfabric.co
- Enterprise Support: Contact the platform support team
- π All Documentation: Complete Docs
- Complete Usage Guide - Full platform usage guide
- API Documentation:
- API Usage Guide - Complete API reference
- API Quick Start - 5-minute API setup
- Postman Collection - Ready-to-use API tests
- API Usage Guide - Complete API reference
- CLI Documentation:
- CLI Guide - Original CLI reference
- CLI Demo - Usage examples
- CLI with API - API integration guide
- CLI Step-by-Step - Detailed CLI usage
- Guides:
- Quick Start - ML models quick start
- Implementation Complete - Implementation status
- Docker:
- Docker Status - Docker deployment info
- Docker Setup - Docker configuration
- Complete Usage Guide - Full platform usage guide
- API Testing Tools:
- Postman Collection: DAFU_API.postman_collection.json - Import & test all endpoints
- Swagger UI: http://localhost:8000/docs (Interactive API docs when API running)
- Postman Collection: DAFU_API.postman_collection.json - Import & test all endpoints
- Architecture: High-Level Architecture
- High Memory Usage: Configure chunked processing for large datasets
- Slow Predictions: Enable model caching and optimize feature engineering
- False Positives: Adjust risk score thresholds and retrain models
- API Rate Limiting: Configure appropriate rate limits for your use case
DAFU Enterprise Fraud Detection Platform v1.0.0
Built with β€οΈ for secure, scalable, and intelligent fraud detection
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
- Commercial Use: β Allowed with restrictions
- Modification: β Allowed
- Distribution: β Allowed with source code disclosure
- Patent Use: β Allowed
- Private Use: β Allowed
- Sublicensing: β Not allowed
- Source Code Disclosure: Any distribution of the software must include the complete source code
- Network Interaction: If you run the software on a server and provide services over a network, you must make the source code available to users
- License Compatibility: Any derivative works must be licensed under the same AGPL-3.0 license
- Attribution: You must retain all copyright notices and license text
The complete license text is available in the LICENSE file in this repository.
For commercial enterprises requiring different licensing terms, please contact MasterFabric for enterprise licensing options.
MasterFabric - Enterprise-level fraud detection and e-commerce analytics solutions.
Contact: dafu@masterfabric.co
Based on the existing test results in the project:
- Accuracy: 90%+ on test datasets
- Detection Methods: Both classic and risk-score based detection working
- Contamination Levels: Multiple levels (0.01, 0.05, 0.1) tested successfully
- Visualization: 4-panel analysis plots generated successfully
- Stream Processing: 100,000 records processed successfully
- LSTM/GRU Models: Successfully trained and evaluated
- Time-series Analysis: User behavior patterns detected
- Model Architecture: Configurable sequence length and hidden units
- Training: TensorFlow-based implementation with early stopping
- Stream Prediction: 10,000 sequence records processed in stream mode
- Model Persistence: Models saved and loaded successfully
- Real-time Processing: Stream data processed with pre-trained models
- Model Loading: Pre-trained models loaded successfully for prediction
- Data Preprocessing: Stream data preprocessed using saved transformers
- Prediction Accuracy: High accuracy maintained in stream mode
- Export Capabilities: Stream results exported with timestamps
- Automatic Analysis: Column detection and data suitability assessment
- Preprocessing: Missing value handling, categorical encoding, scaling
- Export Formats: CSV and JSON outputs with timestamps
- Large Datasets: Efficient processing of substantial data volumes
- Batch vs Stream: Both processing modes working efficiently
