DAFU - Data Analytics Functional Utilities

DAFU is a comprehensive fraud detection and e-commerce analytics platform designed for enterprise deployment. Currently in active development, it provides advanced machine learning-based fraud detection capabilities with a focus on anomaly detection and sequence analysis.

🌟 Platform Overview

DAFU is a fraud detection platform that combines multiple machine learning algorithms to provide comprehensive fraud detection and prevention solutions. The platform is built with modern technologies and follows enterprise best practices, with core ML capabilities fully implemented and enterprise features in development.

Current Capabilities (Implemented)

🚀 Unified CLI with API Integration: All-in-one command-line interface for authentication, logs, reports, products, and ML models
🔐 Authentication & User Management: JWT-based auth with role-based access control (RBAC)
📋 Logging System: Structured logging with analytics and statistics
📊 Report Management: Fraud detection report generation and tracking
🛍️ Product Risk Management: E-commerce product management with fraud risk tracking
🎯 Unified Model Interface: Single entry point for all fraud detection models
🧠 Advanced ML Algorithms: Isolation Forest and LSTM/GRU sequence models fully implemented
📡 Stream Processing: Real-time data stream processing with pre-trained models
💾 Model Persistence: Save and load trained models for production deployment
🔀 Dual Prediction Modes: Both batch and stream prediction capabilities
🌐 FastAPI Backend: Complete REST API with auth, logs, reports, products endpoints
🗄️ Database Layer: PostgreSQL with SQLAlchemy ORM, complete schema
🐳 Docker Infrastructure: PostgreSQL containerization ready
🧪🎓 Dual Learning Modes: Both supervised and unsupervised learning approaches
📊📈 Comprehensive Analysis: 4-panel visualization with detailed performance metrics
🚀 Production-Ready Core: Complete fraud detection pipeline with evaluation
🕵️‍♀️🎯 Flexible Detection: Classic and risk-score based detection methods
🧹 Data Processing: Automatic preprocessing with missing value handling
⚡ Fast Startup: Lazy loading for instant model selection interface

Planned Capabilities

⚡ Real-time API : Sub-50ms fraud scoring endpoints for ultra-low latency decisioning.
Enables the system to detect fraud instantly in live payment flows, ensuring compliance with real-time financial transaction requirements.
🔐 Enterprise Security : OAuth2, JWT, RBAC implementation.
Adds enterprise-grade authentication, token-based access, and role-based authorization to secure deployments in regulated environments.
☸️ Scalable Architecture : Kubernetes deployment with auto-scaling.
Provides seamless horizontal scaling based on traffic load, supporting both small-scale PoCs and large enterprise production clusters.
📈 Advanced Monitoring : Prometheus, Grafana, Jaeger integration.
Full observability with metrics collection, real-time dashboards, and distributed tracing for faster issue detection and resolution.
🚦 High-throughput Processing : 10,000+ TPS optimization.
Optimized to handle extremely high transaction volumes, scaling to 10,000+ transactions per second to meet the demands of major banks and payment providers.

📋 Table of Contents

📚 Documentation

Complete Documentation - All documentation organized by category
- CLI Documentation - Interactive CLI guides
- Docker Documentation - Docker setup and deployment
- General Guides - Quick start and implementation guides

🚀 Quick Start

Prerequisites

Python 3.8+ (Python 3.9+ recommended)
Docker (for PostgreSQL database)
Git (for cloning the repository)
8GB+ RAM (for ML model training)
2GB+ free disk space (for models and data)

For API Features (NEW!):

PostgreSQL (Docker container recommended)
Port 8000 (for API server)
Port 5432 (for PostgreSQL)

Optional (for production):

Kubernetes cluster
Redis (for caching)

Installation

Option 1: Interactive CLI (Easiest - Recommended)

The fastest way to get started with DAFU!

# Clone the repository
git clone https://github.com/MasterFabric/dafu.git
cd dafu

# Make CLI executable
chmod +x dafu

# Start interactive CLI
./dafu

# You'll see:
# ╔════════════════════════════════════════════════════════════╗
# ║  ____    _    _____ _   _                                  ║
# ║ |  _ \  / \  |  ___| | | |                                 ║
# ║ | | | |/ _ \ | |_  | | | |                                 ║
# ║ | |_| / ___ \|  _| | |_| |                                 ║
# ║ |____/_/   \_\_|    \___/                                  ║
# ║                                                            ║
# ║ Data Analytics Functional Utilities - Interactive CLI     ║
# ║ Enterprise Fraud Detection & Analytics Platform           ║
# ╚════════════════════════════════════════════════════════════╝
# 
# Welcome to DAFU Interactive CLI!
# Type 'help' for available commands or 'exit' to quit
# 
# dafu> 

# Try different features
dafu> help                    # See all commands
dafu> auth login              # Login to API (NEW!)
dafu> logs list               # View system logs (NEW!)
dafu> reports list            # View fraud reports (NEW!)
dafu> products stats          # Product statistics (NEW!)
dafu> fraud-detection         # Run ML models
dafu> docker status           # Check Docker services

# The CLI will:
# ✓ Auto-create virtual environment if needed
# ✓ Auto-install dependencies
# ✓ Manage authentication sessions
# ✓ Provide unified access to all features
# ✓ Return to CLI prompt after each command

Available CLI Commands:

Category	Command	Description
API & Auth	`auth login/logout/whoami/register`	User authentication & management
Logs	`logs list/stats`	System logging & analytics
Reports	`reports list/create/view/stats`	Fraud detection reports
Products	`products list/high-risk/stats`	Product risk management
ML Models	`fraud-detection`, `models`, `ml`	Run fraud detection models
Docker	`docker up/down/restart/status/logs`	Manage Docker services
System	`status`, `info`, `version`	Show system information
Utilities	`help`, `clear`, `exit`	Utility commands

Key Features:

✅ API Integration - Full authentication, logs, reports, products management
✅ Persistent Session - Login once, use everywhere with session management
✅ Auto-Setup - Automatically creates virtual environment and installs dependencies
✅ Error Resilient - CLI stays active even when commands fail
✅ User-Friendly - Color-coded output and helpful messages
✅ Scriptable - Use in automation with single command mode
✅ Role-Based Access - Support for viewer, user, analyst, admin roles

📖 Documentation:

Complete Usage Guide - Full platform usage
CLI Guide - Interactive CLI reference
API Guide - REST API documentation
All Documentation - Complete documentation library

Option 2: With API Features (Full Platform)

Complete platform with authentication, logging, reports, and product management

Step 1: Start PostgreSQL

docker run -d --name dafu-postgres \
  -e POSTGRES_USER=dafu \
  -e POSTGRES_PASSWORD=dafu_secure_password \
  -e POSTGRES_DB=dafu \
  -p 5432:5432 \
  postgres:15-alpine

Step 2: Start API Server (in separate terminal)

cd dafu/core/features/fraud_detection
./start_api.sh

# Wait for:
# INFO:     Uvicorn running on http://0.0.0.0:8000
# INFO:     Application startup complete.

Step 3: Use DAFU CLI

./dafu

dafu> auth register     # First time: register user
dafu> auth login        # Login with credentials
dafu> auth whoami       # Check your user info
dafu> logs list         # View system logs
dafu> reports list      # View fraud reports
dafu> products stats    # Product statistics
dafu> fraud-detection   # Run ML models

📖 Complete Guide: See docs/USAGE_GUIDE.md for detailed instructions

Features Available:

✅ JWT authentication with RBAC
✅ System logging and analytics
✅ Fraud detection report generation
✅ Product risk management
✅ All ML models
✅ RESTful API endpoints
✅ Database persistence
✅ Session management

API Documentation: http://localhost:8000/docs (Swagger UI)

Option 3: ML Models Only (Standalone)

Step 1: Clone and Setup Environment

# Clone the repository
git clone https://github.com/MasterFabric/dafu.git
cd dafu

# Create virtual environment
python3 -m venv dafu_env
source dafu_env/bin/activate  # On Windows: dafu_env\Scripts\activate

# Expected output:
# (dafu_env) masterfabric@machine:dafu$

Step 2: Install Dependencies

# Navigate to fraud detection module
cd core/features/fraud_detection

# Install minimal dependencies (recommended for first-time users)
pip install -r requirements-minimal.txt

# Expected output:
# Collecting numpy>=1.21.0
#   Downloading numpy-1.24.3-cp39-cp39-macosx_10_9_x86_64.whl (20.1 MB)
#      ██████████████████████████████ 20.1/20.1 MB 2.1 MB/s eta 0:00:00
# Collecting pandas>=1.3.0
#   Downloading pandas-1.5.3-cp39-cp39-macosx_10_9_x86_64.whl (11.3 MB)
#      ██████████████████████████████ 11.3/11.3 MB 2.8 MB/s eta 0:00:00
# ...
# Successfully installed numpy-1.24.3 pandas-1.5.3 scikit-learn-1.3.0 ...

Step 3: Verify Installation

# Test the installation
python -c "from src.models.anomaly_detection import IsolationForestFraudDetector; print('✅ Installation successful!')"

# Expected output:
# ✅ Installation successful!

Step 4: Run Unified Model Interface

# Run the unified model selection interface
cd core/features/fraud_detection/src/models
python main.py

# Expected terminal interaction:
# ========================================
# 🔍 ENTERPRISE FRAUD DETECTION PLATFORM
# ========================================
# Advanced Machine Learning Models for Fraud Detection
# Version: 1.0.0
# ========================================
# 
# This platform offers multiple fraud detection approaches:
# • Traditional ML: Isolation Forest with Risk Score analysis
# • Deep Learning: LSTM and GRU sequence-based models
# • Both supervised and unsupervised learning modes
# • Real-time streaming and batch processing capabilities
# ========================================
# 
# ⚡ Fast startup - models load only when selected!
# 
# ============================================================
# 🎯 SELECT FRAUD DETECTION MODEL
# ============================================================
# Choose the type of fraud detection model you want to use:
# 
# 1. 🔍 ISOLATION FOREST & RISK SCORE
#    • Traditional machine learning approach
#    • Excellent for tabular data with numerical features
#    • Supports both supervised and unsupervised learning
#    • Risk score based anomaly detection
#    • Fast training and prediction
# 
# 2. 🧠 SEQUENCE MODELS (LSTM & GRU)
#    • Deep learning approach for sequential data
#    • Captures temporal patterns and dependencies
#    • Autoencoder architecture for anomaly detection
#    • Best for time-series and transaction sequences
#    • More complex but potentially more accurate
# 
# 3. ℹ️  MODEL COMPARISON
#    • Compare different models on the same dataset
#    • Get recommendations based on your data
# 
# 4. ❓ HELP & INFORMATION
#    • Detailed information about each model
#    • Data requirements and recommendations
# 
# 5. 🚪 EXIT
#    • Exit the application
# ============================================================
# 
# Enter your choice (1-5):

Alternative: Run Individual Model Tests

# Run individual model tests (legacy method)
cd core/features/fraud_detection
python test_anomaly_detection.py
python test_sequence_models_interactive.py

Option 4: Docker Compose Setup ⚠️

Status: Infrastructure prepared, services not integrated yet

What's Ready:

✅ Docker configuration files
✅ Database schemas
✅ Service definitions
⚠️ ML models NOT integrated with API yet

Current Limitation: Docker Compose services are commented out until API-ML integration is complete. For now, use Option 1 (Local Development) to run ML models.

Future Setup (when ready):

# Clone and navigate
git clone https://github.com/MasterFabric/dafu.git
cd dafu

# Uncomment services in docker-compose.yml
# Then start services
docker-compose up -d

Why Services Are Commented Out:

The ML models (Isolation Forest, LSTM/GRU) work perfectly standalone, but the FastAPI endpoints need ML integration. All infrastructure (database schemas, service configs, monitoring) is prepared and ready to be activated once the integration is complete.

What You Can Do Now:

✅ Use all ML models via Python (Option 1)
✅ Train and save models
✅ Stream and batch processing
✅ See Docker Status for integration roadmap

Next Step: Integrate ML models with FastAPI, then uncomment services in docker-compose.yml.

Option 5: Docker Deployment (Coming Soon)

For testing individual components:

# Build the fraud detection service
cd core/features/fraud_detection
docker build -f deployment/Dockerfile -t dafu-fraud-detection .

# Run with sample data
docker run -it --rm \
  -v $(pwd)/sample_fraud_data.csv:/app/data.csv \
  dafu-fraud-detection \
  python test_anomaly_detection.py

Option 6: Kubernetes Deployment (Production)

Step 1: Deploy with Helm

# Deploy using Helm (when API is ready)
cd core/features/fraud_detection/deployment
helm install dafu-fraud-detection ./helm-charts/ \
  --set image.tag=latest \
  --set replicas=3 \
  --set resources.requests.memory=512Mi

# Expected output:
# NAME: dafu-fraud-detection
# LAST DEPLOYED: Mon Jan 15 10:30:00 2024
# NAMESPACE: default
# STATUS: deployed
# REVISION: 1
# TEST SUITE: None

Step 2: Verify Deployment

# Check pod status
kubectl get pods -l app=dafu-fraud-detection

# Expected output:
# NAME                                READY   STATUS    RESTARTS   AGE
# dafu-fraud-detection-7d4b8c9f-abc   1/1     Running   0          2m
# dafu-fraud-detection-7d4b8c9f-def   1/1     Running   0          2m
# dafu-fraud-detection-7d4b8c9f-ghi   1/1     Running   0          2m

First-Time User Guide

🎯 Quick Demo (5 minutes)

1. Run the Unified Model Interface

cd core/features/fraud_detection/src/models
python main.py

Expected Terminal Interface:

🔍 ENTERPRISE FRAUD DETECTION PLATFORM
========================================
Advanced Machine Learning Models for Fraud Detection
Version: 1.0.0
========================================

This platform offers multiple fraud detection approaches:
• Traditional ML: Isolation Forest with Risk Score analysis
• Deep Learning: LSTM and GRU sequence-based models
• Both supervised and unsupervised learning modes
• Real-time streaming and batch processing capabilities
========================================

⚡ Fast startup - models load only when selected!

============================================================
🎯 SELECT FRAUD DETECTION MODEL
============================================================
Choose the type of fraud detection model you want to use:

1. 🔍 ISOLATION FOREST & RISK SCORE
2. 🧠 SEQUENCE MODELS (LSTM & GRU)
3. ℹ️  MODEL COMPARISON
4. ❓ HELP & INFORMATION
5. 🚪 EXIT

Enter your choice (1-5):

2. Select Your Model

Choose option 1 for Isolation Forest or option 2 for Sequence Models. The system will:

Load the selected model (with progress indicator)
Guide you through configuration
Handle all setup automatically

3. View Results

After completion, you'll see comprehensive results with visualizations and exported data.

🔄 Unified Model Interface

The new unified interface provides:

Single Entry Point: One command to access all fraud detection models
Smart Model Selection: Interactive guidance for choosing the right model
Fast Startup: Lazy loading ensures instant interface response
Model Comparison: Built-in comparison tools and recommendations
Help System: Comprehensive information and decision trees
Seamless Navigation: Easy switching between models and options

🔄 Stream Processing Demo

1. Train a Model First

python test_sequence_models_interactive.py

Expected Questions:

🎯 Prediction Mode Selection:
1. Batch Prediction (train and evaluate)
2. Stream Prediction (use pre-trained model)

Please select prediction mode (1 or 2): 1

🎯 Learning Mode Selection:
1. Supervised Learning
2. Unsupervised Learning

Please select learning mode (1 or 2): 1

🎯 Model Selection:
Available models: ['LSTM', 'GRU', 'Both']
Please select models (comma-separated): LSTM,GRU

2. Test Stream Processing

# Run stream prediction with pre-trained model
python test_sequence_models_interactive.py

Select Stream Mode:

🎯 Prediction Mode Selection:
1. Batch Prediction (train and evaluate)
2. Stream Prediction (use pre-trained model)

Please select prediction mode (1 or 2): 2

📁 Model Package Selection:
Available models: ['my_fraud_model', 'production_model']
Please select model: my_fraud_model

✅ Model loaded successfully!
📊 Processing stream data...
✅ Stream processing complete! Results saved to: stream_results/

🐳 Docker Quick Start

1. One-Command Demo

# Run complete demo in Docker
docker run -it --rm \
  -v $(pwd)/sample_fraud_data.csv:/app/data.csv \
  -v $(pwd)/results:/app/results \
  dafu-fraud-detection \
  python test_anomaly_detection.py

Expected Output:

🚀 DAFU Fraud Detection System - Docker Demo
========================================

📊 Data Analysis Results:
- Dataset shape: (1000, 8)
- Missing values: 0
- Fraud rate: 5.0%

🎯 Running unsupervised anomaly detection...
✅ Analysis complete! Results saved to: /app/results/

🔧 Troubleshooting

Common Issues and Solutions:

Issue 1: Import Error

ModuleNotFoundError: No module named 'src.models.anomaly_detection'

Solution:

# Make sure you're in the fraud_detection directory
cd fraud_detection
python -c "from src.models.anomaly_detection import IsolationForestFraudDetector; print('✅ Fixed!')"

Issue 2: Memory Error

MemoryError: Unable to allocate array

Solution:

# Use smaller dataset or reduce model complexity
export PYTHONHASHSEED=0
python test_anomaly_detection.py

Issue 3: Docker Build Fails

ERROR: failed to solve: failed to resolve source

Solution:

# Clean Docker cache and rebuild
docker system prune -f
docker build --no-cache -f deployment/Dockerfile -t dafu-fraud-detection .

📊 Expected Performance

System Requirements:

Minimum: 4GB RAM, 2 CPU cores
Recommended: 8GB RAM, 4 CPU cores
Production: 16GB+ RAM, 8+ CPU cores

Processing Times:

Small dataset (1K records): 10-30 seconds
Medium dataset (10K records): 2-5 minutes
Large dataset (100K records): 10-20 minutes
Stream processing: <1 second per record

Memory Usage:

Training: 2-4GB RAM
Prediction: 500MB-1GB RAM
Stream mode: 200-500MB RAM

Basic Usage

1. Unified Model Interface (Recommended)

# Start the unified interface
cd core/features/fraud_detection/src/models
python main.py

# Follow the interactive prompts:
# 1. Choose your model (Isolation Forest or Sequence Models)
# 2. Select prediction mode (Batch or Stream)
# 3. Configure parameters
# 4. Run analysis

2. Direct Model Usage (Advanced)

from fraud_detection.src.models.anomaly_detection import IsolationForestFraudDetector

# Initialize the detector
detector = IsolationForestFraudDetector(random_state=42)

# Load and analyze your data
detector.load_and_analyze_data('transaction_data.csv')

# Setup learning mode (supervised/unsupervised)
detector.setup_learning_mode()

# Choose detection method
# - Classic: Binary classification with contamination levels
# - Risk Score: Custom threshold-based detection

# Preprocess data
detector.preprocess_data()

# Train models
detector.train_models([0.01, 0.05, 0.1])  # Multiple contamination levels

# Evaluate and visualize
if detector.is_supervised:
    detector.evaluate_models()

detector.create_visualizations(save_plots=True)
detector.export_results('fraud_analysis_results')

3. Run Individual Model Tests (Legacy)

# Run comprehensive anomaly detection tests
cd core/features/fraud_detection
python test_anomaly_detection.py

# Run sequence model tests
python test_sequence_models_interactive.py

Note: The unified interface (main.py) is now the recommended way to access all fraud detection capabilities. Individual model tests are still available for advanced users.

4. Sequence Model Analysis (NEW! Enhanced)

from fraud_detection.src.models.sequence_models import SequenceFraudDetector

# Initialize sequence detector
sequence_detector = SequenceFraudDetector()

# Setup prediction mode (NEW!)
sequence_detector.setup_prediction_mode()
# Choose: 1. Batch Prediction or 2. Stream Prediction

# For Batch Prediction Mode
if sequence_detector.prediction_mode == 'batch':
    # Load and analyze data
    sequence_detector.load_and_analyze_data('user_sequences.csv')
    
    # Setup learning mode
    sequence_detector.setup_learning_mode()
    
    # Preprocess data
    sequence_detector.preprocess_data()
    
    # Train models
    sequence_detector.train_models(['LSTM', 'GRU'])
    
    # Save trained models (NEW!)
    sequence_detector.save_model_package('my_fraud_model')
    
    # Evaluate and export
    sequence_detector.evaluate_models()
    sequence_detector.export_results('batch_results')

# For Stream Prediction Mode (NEW!)
elif sequence_detector.prediction_mode == 'stream':
    # Load pre-trained model
    sequence_detector.load_model_package('my_fraud_model')
    
    # Load new stream data
    stream_data = pd.read_csv('new_stream_data.csv')
    
    # Preprocess stream data
    processed_stream = sequence_detector.preprocess_stream_data(stream_data)
    
    # Make predictions on stream
    predictions = sequence_detector.predict_stream(processed_stream)
    
    # Export stream results (NEW!)
    sequence_detector.export_stream_results(stream_data, predictions)

5. Stream Prediction with Isolation Forest

from fraud_detection.src.models.anomaly_detection import IsolationForestFraudDetector

# Initialize detector
detector = IsolationForestFraudDetector()

# Setup prediction mode (NEW!)
detector.setup_prediction_mode()
# Choose: 1. Batch Prediction or 2. Stream Prediction

# For Stream Prediction Mode
if detector.prediction_mode == 'stream':
    # Load pre-trained model
    detector.load_model_package('trained_fraud_model')
    
    # Load new stream data
    stream_data = pd.read_csv('new_transactions.csv')
    
    # Preprocess stream data
    processed_stream = detector.preprocess_stream_data(stream_data)
    
    # Make predictions on stream
    results = detector.predict_stream(processed_stream, contamination=0.1)
    
    # Export stream results (NEW!)
    detector.export_stream_results(stream_data, results)

📊 Supported Data Formats

Transaction Data

transaction_id,user_id,amount,merchant_id,timestamp,category,is_fraud
tx_001,user_123,150.00,merchant_456,2024-01-15 10:30:00,electronics,0
tx_002,user_124,2500.00,merchant_789,2024-01-15 11:45:00,jewelry,1

User Behavior Data

user_id,timestamp,action_type,device_id,location,amount
user_123,2024-01-15 10:30:00,login,mobile_device_001,location_A,0
user_123,2024-01-15 10:31:00,purchase,mobile_device_001,location_A,150.00

Time Series Data

timestamp,user_id,transaction_count,daily_amount,risk_score
2024-01-15,user_123,5,750.00,0.2
2024-01-16,user_123,8,1200.00,0.4

🎯 Use Cases and Scenarios

1. Unified Model Selection and Analysis

Scenario: Choose the right fraud detection model for your data Solution: Interactive model selection interface

# Start the unified interface
cd core/features/fraud_detection/src/models
python main.py

# Interactive model selection:
# 1. 🔍 ISOLATION FOREST & RISK SCORE - For tabular data
# 2. 🧠 SEQUENCE MODELS (LSTM & GRU) - For sequential data
# 3. ℹ️  MODEL COMPARISON - Compare different approaches
# 4. ❓ HELP & INFORMATION - Get detailed guidance
# 5. 🚪 EXIT - Exit the application

2. Real-time E-commerce Fraud Detection

Scenario: Detect fraudulent transactions in real-time during checkout Solution: Risk Score API with sub-50ms response time

# Real-time scoring
response = requests.post('https://api.masterfabric.co/dafu/v1/score', json={
    'transaction_id': 'tx_123',
    'amount': 150.00,
    'user_id': 'user_456',
    'merchant_id': 'merchant_789',
    'device_fingerprint': 'fp_abc123',
    'ip_address': '192.168.1.1',
    'user_agent': 'Mozilla/5.0...'
})

fraud_score = response.json()['risk_score']
is_fraud = fraud_score > 0.7  # Custom threshold

3. Batch Fraud Analysis

Scenario: Analyze historical data for fraud patterns and model retraining Solution: Batch Processing API with large-scale data handling

# Batch analysis
batch_request = {
    'data_source': 's3://fraud-data/transactions_2024.csv',
    'analysis_type': 'comprehensive',
    'models': ['isolation_forest', 'lstm', 'xgboost'],
    'output_format': 'detailed_report'
}

response = requests.post('https://api.masterfabric.co/dafu/v1/batch/analyze', json=batch_request)

4. User Behavior Analysis

Scenario: Detect anomalous user behavior patterns over time Solution: Sequence models with LSTM/GRU for temporal pattern recognition

# User behavior analysis
from fraud_detection.src.models.sequence_models import UserBehaviorAnalyzer

analyzer = UserBehaviorAnalyzer()
analyzer.load_user_sequences('user_behavior_data.csv')

# Detect anomalies in user patterns
anomalies = analyzer.detect_behavioral_anomalies(
    sequence_length=30,
    threshold=0.8
)

5. Merchant Risk Assessment

Scenario: Evaluate merchant risk profiles for payment processing Solution: Multi-model ensemble with business rules

# Merchant risk assessment
from fraud_detection.src.rules_engine.rule_processor import MerchantRiskProcessor

processor = MerchantRiskProcessor()
merchant_risk = processor.assess_merchant_risk(
    merchant_id='merchant_123',
    transaction_history='merchant_transactions.csv',
    risk_factors=['chargeback_rate', 'transaction_patterns', 'location_anomalies']
)

6. Real-time Stream Processing

Scenario: Process incoming transactions in real-time using pre-trained models Solution: Stream prediction mode with model persistence

# Real-time stream processing
from fraud_detection.src.models.anomaly_detection import IsolationForestFraudDetector

detector = IsolationForestFraudDetector()
detector.setup_prediction_mode()
# Choose: 2. Stream Prediction

# Load pre-trained model
detector.load_model_package('production_fraud_model')

# Process incoming transactions
while True:
    # Get new transaction from stream
    new_transaction = get_next_transaction()
    
    # Preprocess and predict
    processed = detector.preprocess_stream_data(new_transaction)
    result = detector.predict_stream(processed, contamination=0.1)
    
    # Take action based on prediction
    if result['predictions'][0] == 1:
        block_transaction(new_transaction)
    else:
        approve_transaction(new_transaction)

7. Model Training and Deployment Pipeline

Scenario: Train models on historical data and deploy for production use Solution: Batch training with model persistence

# Training and deployment pipeline
from fraud_detection.src.models.sequence_models import SequenceFraudDetector

# Phase 1: Train models
trainer = SequenceFraudDetector()
trainer.setup_prediction_mode()
# Choose: 1. Batch Prediction

trainer.load_and_analyze_data('historical_fraud_data.csv')
trainer.setup_learning_mode()
trainer.preprocess_data()
trainer.train_models(['LSTM', 'GRU'])

# Save trained models for production
trainer.save_model_package('production_lstm_model')
print("✅ Models trained and saved for production use")

# Phase 2: Deploy for stream processing
deployer = SequenceFraudDetector()
deployer.setup_prediction_mode()
# Choose: 2. Stream Prediction

deployer.load_model_package('production_lstm_model')
print("✅ Models loaded and ready for stream processing")

8. Network Analysis and Graph-based Detection

Scenario: Detect fraud rings and coordinated attacks Solution: Network features with graph analysis

# Network fraud detection
from fraud_detection.src.feature_engineering.network_features import NetworkAnalyzer

analyzer = NetworkAnalyzer()
fraud_rings = analyzer.detect_fraud_networks(
    transaction_data='network_transactions.csv',
    similarity_threshold=0.8,
    min_ring_size=3
)

🏗️ Architecture

High-Level Architecture

Core Components

1. Fraud Detection Service (`/fraud_detection/`)

The main fraud detection microservice with end-to-end ML capabilities, exposed via FastAPI for real-time and batch use cases.

Anomaly Detection: Isolation Forest for unsupervised fraud detection and anomaly scoring.
Sequence Models: LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) models for temporal/behavioral pattern recognition.
Ensemble Methods: XGBoost and Random Forest for robust, tree-based predictions.
Neural Networks: Deep learning models (via TensorFlow/Keras) for complex nonlinear fraud patterns.
Rules Engine: Extensible business rules for configurable thresholds, velocity checks, and custom scoring strategies.

2. Feature Engineering Pipeline

A modular preprocessing and feature extraction pipeline designed for real-time and offline analytics.

Transaction Features: Amount distributions, frequency of transactions, merchant category profiling.
User Features: Historical behavioral patterns, device fingerprints, account age/risk indicators.
Network Features: Graph-based entity relationships (shared IPs, merchants, accounts).
Temporal Features: Time-series analysis (sliding windows, session duration, peak-time anomalies).

3. API Services

Enterprise-grade RESTful APIs providing low-latency endpoints for real-time scoring and large-scale data ingestion.
All APIs are implemented using FastAPI, leveraging OpenAPI/Swagger for documentation and schema validation.

Fraud Scoring API: Real-time fraud detection endpoint.
- Built on FastAPI for async performance.
- Supports REST and optionally gRPC for low-latency scenarios.
- Designed for sub-50ms response times with Redis caching.
Batch Processing API: Bulk scoring and data ingestion.
- Optimized for large datasets with Dask / Apache Spark integration.
- Used for offline analysis, backfills, reporting, and model monitoring.
- Supports scheduled jobs (via Apache Airflow).
Model Management API: Centralized model lifecycle control.
- Provides endpoints for deployment, versioning, and rollback of fraud detection models.
- Integrates with object storage (MinIO / Amazon S3) for model registry.
- Secured with OAuth2 / JWT for enterprise compliance.

4. Enterprise Infrastructure

A cloud-native, microservices-based foundation, optimized for scalability and observability.

Containerization: Docker multi-stage builds for lightweight, reproducible services (Podman is another option).
Orchestration: Kubernetes with Helm charts for deployment, scaling, and service discovery.
Monitoring: Prometheus (metrics), Grafana (dashboards), Jaeger (distributed tracing).
Security: OAuth2 / JWT authentication, RBAC policies, and API key management for multi-tenant enterprise compliance.

🔧 Configuration and Customization

Environment Configuration

# .env file
DATABASE_URL=postgresql://user:password@localhost:5432/dafu
REDIS_URL=redis://localhost:6379/0
MODEL_STORAGE_PATH=/models
LOG_LEVEL=INFO
API_RATE_LIMIT=1000
FRAUD_THRESHOLD=0.7

Model Configuration

# config/models.json
{
    "isolation_forest": {
        "contamination": [0.01, 0.05, 0.1],
        "n_estimators": 100,
        "random_state": 42
    },
    "lstm": {
        "sequence_length": 10,
        "hidden_units": 64,
        "dropout": 0.2,
        "epochs": 50
    },
    "xgboost": {
        "n_estimators": 100,
        "max_depth": 6,
        "learning_rate": 0.1
    }
}

Business Rules Configuration

# config/rules.json
{
    "amount_threshold": {
        "condition": "amount > 10000",
        "risk_score": 0.8,
        "action": "flag_for_review"
    },
    "velocity_check": {
        "condition": "transactions_per_hour > 10",
        "risk_score": 0.6,
        "action": "additional_verification"
    },
    "location_anomaly": {
        "condition": "distance_from_home > 1000km",
        "risk_score": 0.5,
        "action": "location_verification"
    }
}

📈 Performance and Scalability

Performance Metrics

Latency: <50ms for real-time scoring
Throughput: 10,000+ transactions per second
Accuracy: 95%+ fraud detection accuracy
Availability: 99.9% uptime SLA

Scaling Options

Horizontal Scaling

# Kubernetes HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: dafu-fraud-detection
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: dafu-fraud-detection
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Caching Strategy

# Redis caching for model predictions
import redis
from functools import wraps

redis_client = redis.Redis(host='localhost', port=6379, db=0)

def cache_prediction(expiration=3600):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            cache_key = f"prediction:{hash(str(args) + str(kwargs))}"
            cached_result = redis_client.get(cache_key)
            
            if cached_result:
                return json.loads(cached_result)
            
            result = func(*args, **kwargs)
            redis_client.setex(cache_key, expiration, json.dumps(result))
            return result
        return wrapper
    return decorator

🔒 Security and Compliance

Security Features

Authentication: OAuth2/JWT token-based authentication
Authorization: Role-based access control (RBAC)
Data Encryption: TLS 1.3 for data in transit, AES-256 for data at rest
Input Validation: Comprehensive data validation and sanitization
Audit Logging: Complete audit trail for compliance

Compliance Standards

GDPR: Data processing transparency and user rights
PCI DSS: Secure payment card data handling
SOC 2: Security controls and monitoring
ISO 27001: Information security management

Security Configuration

# Security middleware
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.middleware.trustedhost import TrustedHostMiddleware

app = FastAPI()

# CORS configuration
app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://dafu.masterfabric.co"],
    allow_credentials=True,
    allow_methods=["GET", "POST"],
    allow_headers=["*"],
)

# Trusted hosts
app.add_middleware(
    TrustedHostMiddleware,
    allowed_hosts=["dafu.masterfabric.co", "*.masterfabric.co"]
)

🧪 Testing and Quality Assurance

Test Suite

# Run all tests
cd fraud_detection
pytest tests/ -v --cov=src --cov-report=html

# Run specific test categories
pytest tests/test_anomaly_detection.py -v
pytest tests/test_api_endpoints.py -v
pytest tests/test_feature_engineering.py -v

# Performance testing
pytest tests/test_performance.py -v --benchmark-only

Code Quality

# Linting and formatting
black src/ tests/
flake8 src/ tests/
mypy src/
pylint src/

# Pre-commit hooks
pre-commit install
pre-commit run --all-files

📊 Monitoring and Observability

Metrics Collection

# Prometheus metrics
from prometheus_client import Counter, Histogram, Gauge

fraud_predictions_total = Counter('fraud_predictions_total', 'Total fraud predictions', ['model', 'result'])
prediction_latency = Histogram('prediction_latency_seconds', 'Prediction latency')
active_models = Gauge('active_models_total', 'Number of active models')

Logging Configuration

# Structured logging
import structlog

logger = structlog.get_logger()

# Usage
logger.info(
    "fraud_prediction_completed",
    transaction_id="tx_123",
    model="isolation_forest",
    risk_score=0.85,
    processing_time_ms=45
)

Grafana Dashboards

Fraud Detection Metrics: Prediction accuracy, latency, throughput
Model Performance: Model accuracy, drift detection, retraining triggers
System Health: CPU, memory, disk usage, API response times
Business Metrics: Fraud rates, false positives, cost analysis

🚀 Deployment

Current Deployment Options

Local Python Deployment (Active Now) ✅

Best for: Development, testing, ML model training

# Clone repository
git clone https://github.com/MasterFabric/dafu.git
cd dafu/core/features/fraud_detection

# Setup environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run ML models
cd src/models
python main.py  # Interactive model selection

Features Available:

✅ All ML models (Isolation Forest, LSTM, GRU)
✅ Training and prediction
✅ Stream/batch processing
✅ Model persistence
✅ Visualization and export

Docker Compose (Infrastructure Ready) ⚠️

Status: Configuration complete, services commented out until API-ML integration

The complete Docker Compose setup is prepared in docker-compose.yml but all services are currently commented out. See Docker Status for details.

What's Prepared:

Complete service definitions (API, PostgreSQL, Redis, RabbitMQ, Celery, Prometheus, Grafana)
Database schemas
Network and volume configuration
Health checks and monitoring

When Active (after integration):

# Uncomment services in docker-compose.yml
docker-compose up -d

Kubernetes Production

# Deploy to Kubernetes
kubectl apply -f core/features/fraud_detection/deployment/k8s-manifests/

# Or using Helm
helm install dafu ./core/features/fraud_detection/deployment/helm-charts/ \
  --set image.tag=latest \
  --set replicas=3 \
  --set resources.requests.memory=512Mi \
  --set resources.requests.cpu=250m

CI/CD Pipeline

# .github/workflows/deploy.yml
name: Deploy to Production
on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run tests
        run: |
          cd core/features/fraud_detection
          pip install -r requirements.txt
          pytest tests/
  
  deploy:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/dafu-fraud-detection \
            fraud-detection=masterfabric/dafu:latest

🤝 Contributing

Development Setup

Fork the repository

Create a feature branch

git checkout -b feature/your-feature-name

Install development dependencies

cd core/features/fraud_detection
pip install -r requirements.txt
pip install -r requirements-dev.txt  # If available

Run tests and linting

pytest tests/
black src/ tests/
flake8 src/ tests/

Submit a pull request

Code Standards

Python Style: PEP 8 compliance with Black formatting
Type Hints: Comprehensive type annotations
Documentation: Google-style docstrings
Testing: 90%+ test coverage required
ASCII Only: No non-ASCII characters in code (enforced by pre-commit hooks)

Commit Convention

Follow the conventional commit format:

type(scope): description

[optional body]

[optional footer]

Types: feat, fix, docs, style, refactor, test, chore

📚 API Documentation

Fraud Scoring API

POST `/dafu/v1/score`

Real-time fraud scoring endpoint.

Request:

{
  "transaction_id": "tx_123",
  "amount": 150.00,
  "user_id": "user_456",
  "merchant_id": "merchant_789",
  "timestamp": "2024-01-15T10:30:00Z",
  "device_fingerprint": "fp_abc123",
  "ip_address": "192.168.1.1"
}

Response:

{
  "transaction_id": "tx_123",
  "risk_score": 0.85,
  "is_fraud": true,
  "model_used": "isolation_forest",
  "processing_time_ms": 45,
  "confidence": 0.92,
  "explanations": {
    "amount_risk": 0.3,
    "user_behavior_risk": 0.7,
    "merchant_risk": 0.2
  }
}

POST `/dafu/v1/batch/analyze`

Batch fraud analysis endpoint.

Request:

{
  "data_source": "s3://bucket/data.csv",
  "analysis_type": "comprehensive",
  "models": ["isolation_forest", "lstm", "xgboost"],
  "output_format": "detailed_report"
}

Model Management API

GET `/dafu/v1/models`

List available models.

POST `/dafu/v1/models/deploy`

Deploy a new model version.

GET `/dafu/v1/models/{model_id}/performance`

Get model performance metrics.

Postman Collection

A complete Postman collection is available for testing all API endpoints:

📦 DAFU_API.postman_collection.json

What's Included:

Category	Endpoints	Description
1. Authentication	7 endpoints	Register, Login, Logout, Token refresh, Password change, API keys
2. Log Management	6 endpoints	CRUD operations, Statistics, Filtering
3. Report Management	6 endpoints	Report generation, Tracking, Statistics
4. Product Management	7 endpoints	Product CRUD, High-risk detection, Statistics
5. Health & System	3 endpoints	Health check, API info, OpenAPI schema

Features:

✅ Auto-save tokens: Login automatically saves access_token to environment
✅ Complete examples: All requests include sample data
✅ Test scripts: Automated token management
✅ Documentation: Each endpoint documented with descriptions
✅ Environment variables: Pre-configured base_url and tokens

How to Use:

Import into Postman

# Option 1: Import file directly
File → Import → Select DAFU_API.postman_collection.json

# Option 2: Import from URL (if hosted)
File → Import → Link → Paste collection URL

Create Environment (Optional but recommended)

Environment Name: DAFU Local
Variables:
- base_url: http://localhost:8000
- access_token: (will be set automatically after login)
- refresh_token: (will be set automatically after login)

Start API Server

cd core/features/fraud_detection
./start_api.sh

Test Workflow

Step 1: Health Check → Verify API is running
Step 2: Register → Create new user account
Step 3: Login → Get access token (auto-saved)
Step 4: Try any endpoint → Use authenticated requests

Quick Start with Postman:

Start API: ./start_api.sh
Import collection: DAFU_API.postman_collection.json
Run "Register New User" → Create account
Run "Login" → Token saved automatically ✅
Try any authenticated endpoint!

Alternative: Swagger UI

If you prefer browser-based testing:

Start API server
Open http://localhost:8000/docs
Interactive API documentation with "Try it out" buttons

🏆 Current Implementation Status

✅ Implemented Features (Production Ready)

Feature	Description	Status	Implementation Level
Unified Model Interface	Single entry point for all models	✅ NEW! Fully Implemented	Complete with interactive selection
Isolation Forest Detection	Core anomaly detection algorithm	✅ Fully Implemented	Complete with evaluation & visualization
Sequence Models (LSTM/GRU)	Time-series fraud detection	✅ Fully Implemented	Complete with TensorFlow implementation
Stream Prediction Mode	Real-time data stream processing	✅ NEW! Fully Implemented	Complete with model persistence
Batch Prediction Mode	Batch data processing	✅ NEW! Fully Implemented	Complete with training & prediction
Model Persistence	Save/load trained models	✅ NEW! Fully Implemented	Complete with .joblib & .h5 support
Data Preprocessing	Automatic data analysis & feature engineering	✅ Fully Implemented	Complete with missing value handling
Supervised/Unsupervised Modes	Dual learning approaches	✅ Fully Implemented	Complete with mode selection
Risk Score Detection	Custom threshold-based detection	✅ Fully Implemented	Complete with business interpretation
Comprehensive Evaluation	Performance metrics & visualization	✅ Fully Implemented	Complete with 4-panel analysis
Enhanced Result Export	CSV, JSON output with stream support	✅ Enhanced	Complete with stream & batch exports
Docker Infrastructure	Docker Compose configuration	📋 Prepared	All services configured, not integrated yet
FastAPI Basic Structure	REST API framework	📋 Prepared	Basic endpoints exist, ML integration pending
Database Schema	PostgreSQL schema design	📋 Prepared	Complete schema ready, not connected yet
Docker Support	Containerization	✅ Fully Implemented	Dockerfile with multi-stage build
Fast Startup Interface	Lazy loading for instant response	✅ NEW! Fully Implemented	Complete with optimized imports

🚧 In Development (Infrastructure Ready, Integration Pending)

Feature	Description	Status	Implementation Level
API-ML Integration	Connect ML models to FastAPI	🚧 Next Priority	API structure ready, needs ML integration
Database Integration	PostgreSQL connection	🚧 In Development	Schema ready, ORM integration pending
Redis Caching	Performance optimization	🚧 In Development	Config ready, not implemented
Celery Tasks	Background job processing	🚧 In Development	Not implemented yet
Feature Engineering Pipeline	Advanced feature extraction	🚧 Basic Structure	Framework exists, needs implementation
Rules Engine	Business rule processing	🚧 Basic Structure	Framework exists, needs implementation
Ensemble Models	XGBoost, Random Forest	🚧 Basic Structure	Framework exists, needs implementation

📋 Planned Features (Roadmap)

Feature	Description	Status	Target Timeline
Real-time API	Sub-50ms fraud scoring API	📋 Planned	In Development
Enterprise Security	OAuth2, JWT, RBAC	📋 Planned	In Development
Monitoring & Observability	Prometheus, Grafana, Jaeger	📋 Planned	In Development
Auto-scaling	Kubernetes HPA	📋 Planned	In Development
Advanced Analytics	Graph-based fraud detection	📋 Planned	In Development
Model Management	Versioning, A/B testing	📋 Planned	In Development
Compliance Features	GDPR, PCI DSS compliance	📋 Planned	In Development
High-throughput Processing	10,000+ TPS optimization	📋 Planned	In Development

🎯 Current Performance Metrics

Achieved Metrics

Accuracy: 90%+ fraud detection accuracy (based on test results)
Model Training: Complete end-to-end pipeline
Data Processing: Handles large datasets efficiently
Visualization: Comprehensive 4-panel analysis plots
Export Capability: Structured results with timestamps

Target Metrics (Roadmap)

Latency: <50ms for real-time scoring (planned)
Throughput: 10,000+ transactions per second (planned)
Availability: 99.9% uptime SLA (planned)
Security: Zero data breaches (planned)

🗺️ Development Roadmap

Phase 1 - API & Security Foundation

Real-time API Implementation: Complete FastAPI endpoints for fraud scoring
Authentication & Authorization: OAuth2/JWT implementation
Input Validation: Comprehensive request validation
API Documentation: OpenAPI/Swagger documentation
Basic Security: HTTPS, CORS, rate limiting

Phase 2 - Enterprise Infrastructure

Kubernetes Production Deployment: Full K8s manifests and Helm charts
Monitoring & Observability: Prometheus, Grafana, Jaeger integration
Auto-scaling: Kubernetes HPA with custom metrics
Message Queuing: RabbitMQ/Celery for async processing
Advanced Feature Engineering: Complete pipeline implementation

Phase 3 - Advanced Features

Model Management: Versioning, A/B testing, model registry
Ensemble Methods: XGBoost, Random Forest implementation
Graph-based Detection: Network analysis for fraud rings
Business Rules Engine: Complete rule processing system
Advanced Analytics: Dashboard and reporting system

Phase 4 - Scale & Optimization

High-throughput Optimization: 10,000+ TPS processing
Performance Tuning: Memory optimization, caching strategies
Compliance Features: GDPR, PCI DSS compliance tools
Machine Learning Pipeline: Automated model training and deployment
Multi-tenant Architecture: Enterprise multi-tenancy support

📊 Current Test Results

🆘 Support and Community

Getting Help

Documentation: Comprehensive guides and API documentation
GitHub Issues: Bug reports and feature requests
Community Forum: Discussions and Q&A
Feedback & Support: dafu@masterfabric.co
Enterprise Support: Contact the platform support team

Resources

📚 All Documentation: Complete Docs
- Complete Usage Guide - Full platform usage guide
- API Documentation:
  - API Usage Guide - Complete API reference
  - API Quick Start - 5-minute API setup
  - Postman Collection - Ready-to-use API tests
- CLI Documentation:
  - CLI Guide - Original CLI reference
  - CLI Demo - Usage examples
  - CLI with API - API integration guide
  - CLI Step-by-Step - Detailed CLI usage
- Guides:
  - Quick Start - ML models quick start
  - Implementation Complete - Implementation status
- Docker:
  - Docker Status - Docker deployment info
  - Docker Setup - Docker configuration
API Testing Tools:
- Postman Collection: DAFU_API.postman_collection.json - Import & test all endpoints
- Swagger UI: http://localhost:8000/docs (Interactive API docs when API running)
Architecture: High-Level Architecture

Common Issues

High Memory Usage: Configure chunked processing for large datasets
Slow Predictions: Enable model caching and optimize feature engineering
False Positives: Adjust risk score thresholds and retrain models
API Rate Limiting: Configure appropriate rate limits for your use case

DAFU Enterprise Fraud Detection Platform v1.0.0
Built with ❤️ for secure, scalable, and intelligent fraud detection

📄 License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

License Summary

Commercial Use: ✅ Allowed with restrictions
Modification: ✅ Allowed
Distribution: ✅ Allowed with source code disclosure
Patent Use: ✅ Allowed
Private Use: ✅ Allowed
Sublicensing: ❌ Not allowed

Key Requirements

Source Code Disclosure: Any distribution of the software must include the complete source code
Network Interaction: If you run the software on a server and provide services over a network, you must make the source code available to users
License Compatibility: Any derivative works must be licensed under the same AGPL-3.0 license
Attribution: You must retain all copyright notices and license text

Full License Text

The complete license text is available in the LICENSE file in this repository.

Enterprise Licensing

For commercial enterprises requiring different licensing terms, please contact MasterFabric for enterprise licensing options.

MasterFabric - Enterprise-level fraud detection and e-commerce analytics solutions.

Contact: dafu@masterfabric.co

Based on the existing test results in the project:

Anomaly Detection Performance

Accuracy: 90%+ on test datasets
Detection Methods: Both classic and risk-score based detection working
Contamination Levels: Multiple levels (0.01, 0.05, 0.1) tested successfully
Visualization: 4-panel analysis plots generated successfully
Stream Processing: 100,000 records processed successfully

Sequence Model Performance

LSTM/GRU Models: Successfully trained and evaluated
Time-series Analysis: User behavior patterns detected
Model Architecture: Configurable sequence length and hidden units
Training: TensorFlow-based implementation with early stopping
Stream Prediction: 10,000 sequence records processed in stream mode
Model Persistence: Models saved and loaded successfully

Stream Processing Capabilities

Real-time Processing: Stream data processed with pre-trained models
Model Loading: Pre-trained models loaded successfully for prediction
Data Preprocessing: Stream data preprocessed using saved transformers
Prediction Accuracy: High accuracy maintained in stream mode
Export Capabilities: Stream results exported with timestamps

Data Processing Capabilities

Automatic Analysis: Column detection and data suitability assessment
Preprocessing: Missing value handling, categorical encoding, scaling
Export Formats: CSV and JSON outputs with timestamps
Large Datasets: Efficient processing of substantial data volumes
Batch vs Stream: Both processing modes working efficiently

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.cursor/rules		.cursor/rules
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
core		core
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DAFU_API.postman_collection.json		DAFU_API.postman_collection.json
DOCUMENTATION_INDEX.md		DOCUMENTATION_INDEX.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
commits.md		commits.md
dafu		dafu
docker-compose.yml		docker-compose.yml
start.sh		start.sh

Uh oh!

License

Uh oh!

masterfabric/dafu

Folders and files

Latest commit

History

Repository files navigation

DAFU - Data Analytics Functional Utilities

🌟 Platform Overview

Current Capabilities (Implemented)

Planned Capabilities

📋 Table of Contents

🚀 Getting Started

🏗️ Architecture & Components

⚙️ Configuration & Usage

🧪 Development & Testing

📚 API & Documentation

📊 Project Status

📚 Documentation

🆘 Support

🚀 Quick Start

Prerequisites

Installation

Option 1: Interactive CLI (Easiest - Recommended)

Option 2: With API Features (Full Platform)

Option 3: ML Models Only (Standalone)

Option 4: Docker Compose Setup ⚠️

Option 5: Docker Deployment (Coming Soon)

Option 6: Kubernetes Deployment (Production)

First-Time User Guide

🎯 Quick Demo (5 minutes)

🔄 Unified Model Interface

🔄 Stream Processing Demo

🐳 Docker Quick Start

🔧 Troubleshooting

📊 Expected Performance

Basic Usage

1. Unified Model Interface (Recommended)

2. Direct Model Usage (Advanced)

3. Run Individual Model Tests (Legacy)

4. Sequence Model Analysis (NEW! Enhanced)

5. Stream Prediction with Isolation Forest

📊 Supported Data Formats

Transaction Data

User Behavior Data

Time Series Data

🎯 Use Cases and Scenarios

1. Unified Model Selection and Analysis

2. Real-time E-commerce Fraud Detection

3. Batch Fraud Analysis

4. User Behavior Analysis

5. Merchant Risk Assessment

6. Real-time Stream Processing

7. Model Training and Deployment Pipeline

8. Network Analysis and Graph-based Detection

🏗️ Architecture

High-Level Architecture

Core Components

1. Fraud Detection Service (/fraud_detection/)

2. Feature Engineering Pipeline

3. API Services

4. Enterprise Infrastructure

🔧 Configuration and Customization

Environment Configuration

Model Configuration

Business Rules Configuration

📈 Performance and Scalability

Performance Metrics

Scaling Options

Horizontal Scaling

Caching Strategy

🔒 Security and Compliance

Security Features

Compliance Standards

Security Configuration

🧪 Testing and Quality Assurance

Test Suite

Code Quality

📊 Monitoring and Observability

1. Fraud Detection Service (`/fraud_detection/`)

POST `/dafu/v1/score`

POST `/dafu/v1/batch/analyze`

GET `/dafu/v1/models`

POST `/dafu/v1/models/deploy`

GET `/dafu/v1/models/{model_id}/performance`