Skip to content

โšก Real-time fraud & anomaly detection system for streaming transactions. Built with Kafka Streams + Isolation Forest ML. Low-latency processing, online learning, and scalable architecture for detecting fraud patterns in transaction data. ๐Ÿšจ๐Ÿ”

License

Notifications You must be signed in to change notification settings

wesleyscholl/SlipStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โšก๏ธ SlipStream - Kafka-Based Real-Time Anomaly Detector

Status: Enterprise-grade fraud detection system with real-time streaming analytics - production-ready Java application for financial security workflows.

๐Ÿ” A high-performance, real-time anomaly detection system built in Java for detecting fraud and anomalies in streaming transaction data using Apache Kafka and statistical machine learning.

Java Kafka License Build

๐ŸŽฅ Demo

SlipStreamDemo

โœจ Features

  • ๐Ÿš€ Real-time Processing: Built on Apache Kafka Streams for low-latency stream processing
  • ๐Ÿค– Statistical ML: Uses advanced statistical algorithms for anomaly detection
  • ๐Ÿ“ˆ Scalable Architecture: Horizontally scalable with Kafka's distributed processing
  • ๐Ÿง  Adaptive Learning: Adapts to new patterns with continuous model updates
  • ๐ŸŽฏ Multiple Anomaly Types: Detects various types of anomalies (fraud, unusual amounts, time patterns, etc.)
  • โšก High Throughput: Optimized for processing thousands of transactions per second
  • ๐Ÿ“Š Built-in Monitoring: Real-time metrics and comprehensive logging for observability
  • ๐Ÿ”ง Easy Configuration: Environment variables and properties-based configuration
  • ๐Ÿ›ก๏ธ Production Ready: Comprehensive error handling and graceful shutdown
  • ๐ŸŽจ Visual Demo: Beautiful colored output for presentations and demos

๐Ÿ—๏ธ Architecture

๐Ÿ“ก [Transaction Stream] โ†’ ๐Ÿ“ฅ [Kafka Input Topic] โ†’ ๐Ÿ”„ [SlipStream Processor] โ†’ ๐Ÿ“ค [Kafka Output Topics]
                                                            โ†“
                                                   ๐Ÿง  [Statistical ML Engine]
                                                            โ†“
                                                   ๐ŸŽฏ [Anomaly Classification]
                                                            โ†“
                                            โœ… [Normal Results]    ๐Ÿšจ [Anomaly Alerts]

๐Ÿ”„ Processing Flow

  1. ๐Ÿ“จ Data Ingestion: Transactions stream into Kafka topics in real-time
  2. ๐Ÿ” Feature Extraction: Extract relevant features (amount, location, time, velocity)
  3. ๐Ÿงฎ Statistical Analysis: Apply Z-score analysis and composite scoring
  4. ๐ŸŽฏ Anomaly Detection: Identify outliers using configurable thresholds
  5. ๐Ÿ“Š Result Classification: Route normal vs anomalous transactions
  6. ๐Ÿšจ Alert Generation: Send high-confidence anomalies to alert topics

๐Ÿ› ๏ธ Technology Stack

  • โ˜• Java 17: Modern Java with performance optimizations and latest features
  • ๐ŸŒŠ Apache Kafka Streams: Stream processing framework for real-time data
  • ๐Ÿ“Š Apache Commons Math: Statistical functions for anomaly detection
  • ๐Ÿ”„ Jackson: High-performance JSON serialization/deserialization
  • ๐Ÿ“ SLF4J + Logback: Comprehensive logging framework with file rotation
  • โœ… JUnit 5: Modern testing framework with comprehensive test coverage
  • ๐Ÿ”ง Maven: Build automation and dependency management
  • ๐Ÿณ Docker Compose: Containerized infrastructure setup
  • ๐ŸŽจ ANSI Colors: Beautiful terminal output for demos

๐ŸŽฌ Visual Demo

Perfect for presentations and recording!

Experience SlipStream's real-time anomaly detection with our interactive visual demo:

# Quick visual demo (no Kafka required)
./visual-demo.sh

# Full interactive demo (with Kafka)
./demo.sh

This launches a complete demonstration with:

  • ๐ŸŒˆ Colorful real-time transaction streams
  • ๐Ÿšจ Live anomaly alerts with visual highlighting
  • ๐Ÿ“Š Anomaly scoring and confidence levels
  • ๐ŸŽฏ Multiple anomaly types (high amount, velocity, location, time)

Demo Resources:

๐Ÿš€ Quick Start

๐Ÿ“‹ Prerequisites

  • โ˜• Java 17 or higher
  • ๐ŸŒŠ Apache Kafka 3.6+ running on localhost:9092
  • ๐Ÿ”ง Maven 3.6+
  • ๐Ÿณ Docker & Docker Compose (for easy Kafka setup)

1. ๐Ÿ”จ Build the Project

mvn clean compile

2. โœ… Run Tests

mvn test

3. ๐Ÿณ Start Kafka (Docker Compose - Recommended)

# Start all services (Kafka, Zookeeper, UI)
docker compose up -d

# Or use Podman
podman compose up -d

4. ๐Ÿ“ฅ Create Required Topics

# The init-kafka container automatically creates these topics:
# - transactions (input)
# - anomalies (all results) 
# - alerts (anomalies only)

# Verify topics were created
docker compose exec kafka kafka-topics --list --bootstrap-server localhost:9092

5. ๐ŸŽฏ Run SlipStream

mvn exec:java -Dexec.mainClass="com.slipstream.SlipStreamApplication"

Or build and run the JAR:

mvn package
java -jar target/slipstream-anomaly-detector-1.0.0-SNAPSHOT.jar

6. ๐ŸŽฌ Try the Demo

# Quick visual demo (no Kafka needed)
./visual-demo.sh

# Full interactive demo (with real Kafka)
./demo.sh

โš™๏ธ Configuration

๐ŸŒ Environment Variables

Variable Description Default Example
KAFKA_BOOTSTRAP_SERVERS ๐ŸŒŠ Kafka bootstrap servers localhost:9092 kafka1:9092,kafka2:9092
KAFKA_INPUT_TOPIC ๐Ÿ“ฅ Input topic for transactions transactions payment-events
KAFKA_OUTPUT_TOPIC ๐Ÿ“ค Output topic for all results anomalies fraud-results
KAFKA_ALERTS_TOPIC ๐Ÿšจ Alerts topic for anomalies alerts fraud-alerts
KAFKA_NUM_THREADS ๐Ÿ”„ Number of stream threads 1 4
KAFKA_STATE_DIR ๐Ÿ’พ Directory for state stores /tmp/kafka-streams /data/streams

๐Ÿ“ Application Properties

Configuration can also be set in src/main/resources/application.properties:

# ๐ŸŒŠ Kafka Configuration
kafka.bootstrap.servers=localhost:9092
kafka.input.topic=transactions
kafka.output.topic=anomalies
kafka.alerts.topic=alerts

# ๐Ÿง  Statistical Detector Settings
detector.statistical.zscore_threshold=2.5
detector.statistical.anomaly_threshold=0.7
detector.statistical.min_samples=20

# ๐Ÿ”„ Stream Processing
kafka.num.stream.threads=1
kafka.commit.interval.ms=10000

Data Format

Input Transaction Format

{
  "transaction_id": "tx_12345",
  "user_id": "user_67890",
  "merchant_id": "merchant_abc",
  "amount": 150.75,
  "currency": "USD",
  "timestamp": "2024-01-15T14:30:00",
  "location": {
    "latitude": 40.7128,
    "longitude": -74.0060,
    "country": "USA",
    "city": "New York"
  },
  "payment_method": "credit_card",
  "merchant_category": "grocery",
  "metadata": {
    "device_id": "mobile_123",
    "session_id": "sess_456"
  }
}

Output Anomaly Result Format

{
  "transaction_id": "tx_12345",
  "is_anomaly": true,
  "anomaly_score": 0.85,
  "confidence": 0.92,
  "anomaly_type": "unusual_amount",
  "detected_at": "2024-01-15T14:30:05",
  "original_transaction": { ... },
  "features_used": {
    "amount": 150.75,
    "hour_of_day": 14,
    "amount_ratio": 3.2
  },
  "reason": "Anomaly score: 0.850, Type: unusual_amount, Large transaction amount: $150.75"
}

๐Ÿ”ง Testing

๐Ÿ“จ Send Test Transactions

You can use the Kafka console producer to send test transactions:

# Start the console producer
docker compose exec kafka kafka-console-producer --topic transactions --bootstrap-server localhost:9092

# Or use our demo transaction generator
mvn exec:java -Dexec.mainClass='com.slipstream.demo.TransactionGenerator' -Dexec.args="30"

๐Ÿ“‹ Sample Transaction

{
  "transaction_id": "tx_001",
  "user_id": "user_123", 
  "merchant_id": "merchant_grocery",
  "amount": 50.0,
  "currency": "USD",
  "timestamp": "2024-01-15T14:30:00",
  "location": {
    "latitude": 40.7128,
    "longitude": -74.0060,
    "country": "USA", 
    "city": "New York"
  },
  "payment_method": "credit_card",
  "merchant_category": "grocery",
  "metadata": {}
}

๐Ÿ‘€ Monitor Results

# ๐Ÿ“Š Monitor all results
docker compose exec kafka kafka-console-consumer --topic anomalies --bootstrap-server localhost:9092 --from-beginning

# ๐Ÿšจ Monitor alerts only
docker compose exec kafka kafka-console-consumer --topic alerts --bootstrap-server localhost:9092 --from-beginning

# ๐ŸŽจ Use our beautiful anomaly monitor
mvn exec:java -Dexec.mainClass='com.slipstream.demo.AnomalyResultConsumer'

๐ŸŽฏ Anomaly Types

SlipStream detects several types of anomalies using statistical analysis:

  • ๐Ÿšจ FRAUD: General fraud patterns and suspicious behavior
  • ๐Ÿ’ฐ UNUSUAL_AMOUNT: Transactions with abnormally high amounts (>$5,000)
  • โšก VELOCITY: High frequency of transactions from same user (>3 in 5 minutes)
  • ๐ŸŒ LOCATION: Transactions from unusual or suspicious locations
  • ๐Ÿ• TIME_PATTERN: Transactions at unusual times (late night/early morning)
  • ๐Ÿช MERCHANT_PATTERN: Unusual merchant interaction patterns
  • ๐Ÿ“Š STATISTICAL_OUTLIER: General statistical anomalies using Z-score analysis

๐Ÿงฎ Detection Algorithms

  • Z-Score Analysis: Statistical outlier detection based on standard deviations
  • Composite Scoring: Combines multiple factors for accurate detection
  • Velocity Detection: Tracks transaction frequency per user
  • Location Analysis: Identifies geographically suspicious transactions
  • Time Pattern Recognition: Detects unusual timing patterns

โšก Performance Tuning

๐ŸŒŠ Kafka Streams Configuration

# ๐Ÿš€ Increase processing threads for higher throughput
KAFKA_NUM_THREADS=4

# โฑ๏ธ Adjust commit interval for latency vs. throughput trade-off
kafka.commit.interval.ms=10000

# ๐Ÿ—„๏ธ State store caching for better performance
kafka.cache.max.bytes.buffering=10485760

# ๐Ÿ”„ Processing guarantee
kafka.processing.guarantee=at_least_once

# ๐Ÿ“ฆ Batch size optimization
kafka.batch.size=16384

โ˜• JVM Tuning

# ๐Ÿš€ Production JVM settings
java -Xmx4g -Xms2g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=100 \
     -XX:+UseStringDeduplication \
     -jar slipstream.jar

# ๐Ÿ› ๏ธ Development settings
java -Xmx1g -Xms512m \
     -XX:+UseG1GC \
     -jar slipstream.jar

๐Ÿ“Š Performance Metrics

Expected performance on modern hardware:

  • Throughput: 10,000+ transactions/second
  • Latency: <50ms for anomaly detection
  • Memory: 1-4GB depending on state store size
  • CPU: 2-8 cores for optimal performance

๐Ÿ“Š Monitoring and Metrics

SlipStream provides comprehensive observability:

๐Ÿ“ˆ Built-in Metrics (logged every 30 seconds)

  • ๐Ÿ”„ Stream Processing State: Current topology status
  • ๐Ÿ“š Training Data Size: Number of samples in the statistical model
  • ๐Ÿง  Model Status: Training completion and health
  • ๐Ÿ“Š Transaction Counters: Total processed, normal, anomalous
  • ๐Ÿ’พ JVM Memory Usage: Heap and non-heap memory statistics
  • โฑ๏ธ Processing Latency: Average and 99th percentile latencies

๐Ÿ“ Logging Levels

# ๐Ÿ› Debug mode for development
logging.level.com.slipstream=DEBUG

# ๐Ÿ“Š Info mode for production (default)
logging.level.com.slipstream=INFO

# ๐Ÿ”‡ Reduce Kafka noise
logging.level.org.apache.kafka=WARN

๐ŸŽฏ Health Checks

The application provides several health indicators:

  • โœ… Kafka Connectivity: Connection to Kafka brokers
  • ๐Ÿ”„ Stream State: Kafka Streams topology health
  • ๐Ÿง  Model Health: Statistical model training status
  • ๐Ÿ’พ Memory Usage: JVM memory consumption
  • ๐Ÿ“Š Processing Rate: Transactions per second

๐Ÿ‘จโ€๐Ÿ’ป Development

๐Ÿ“ Project Structure

src/
โ”œโ”€โ”€ ๐Ÿ“ฑ main/java/com/slipstream/
โ”‚   โ”œโ”€โ”€ ๐Ÿš€ SlipStreamApplication.java          # Main application entry point
โ”‚   โ”œโ”€โ”€ โš™๏ธ config/
โ”‚   โ”‚   โ””โ”€โ”€ KafkaConfig.java                   # Kafka configuration management
โ”‚   โ”œโ”€โ”€ ๐Ÿง  detector/
โ”‚   โ”‚   โ”œโ”€โ”€ AnomalyDetector.java               # Detector interface
โ”‚   โ”‚   โ””โ”€โ”€ StatisticalAnomalyDetector.java    # Statistical ML detector
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š model/
โ”‚   โ”‚   โ”œโ”€โ”€ TransactionEvent.java              # Input transaction data model
โ”‚   โ”‚   โ””โ”€โ”€ AnomalyResult.java                 # Output anomaly result model
โ”‚   โ”œโ”€โ”€ ๐ŸŽจ demo/
โ”‚   โ”‚   โ”œโ”€โ”€ TransactionGenerator.java          # Demo transaction generator
โ”‚   โ”‚   โ””โ”€โ”€ AnomalyResultConsumer.java         # Visual anomaly monitor
โ”‚   โ””โ”€โ”€ ๐ŸŒŠ stream/
โ”‚       โ””โ”€โ”€ AnomalyDetectionStreams.java       # Kafka Streams topology
โ”œโ”€โ”€ ๐Ÿ“ main/resources/
โ”‚   โ”œโ”€โ”€ application.properties                 # Application configuration
โ”‚   โ””โ”€โ”€ logback.xml                           # Logging configuration
โ””โ”€โ”€ โœ… test/java/                              # Comprehensive unit tests
    โ”œโ”€โ”€ StatisticalAnomalyDetectorTest.java
    โ””โ”€โ”€ TransactionEventTest.java

๐Ÿ› ๏ธ Building and Testing

# ๐Ÿ”ง Compile only
mvn compile

# โœ… Run all tests
mvn test

# ๐Ÿ“Š Run tests with coverage
mvn test jacoco:report

# ๐Ÿ“ฆ Create executable JAR
mvn package

# ๐Ÿš€ Run integration tests (requires running Kafka)
mvn verify

# ๐Ÿงน Clean build artifacts
mvn clean

๐ŸŽฏ IDE Setup

IntelliJ IDEA:

# Import as Maven project
# Enable annotation processing
# Set Java 17 as project SDK

VS Code:

# Install Java Extension Pack
# Install Maven for Java extension
# Configure Java 17 runtime

๐Ÿค Contributing

We welcome contributions! Here's how you can help make SlipStream better:

๐Ÿš€ Quick Start for Contributors

  1. ๐Ÿด Fork the repository
  2. ๐ŸŒฟ Create a feature branch: git checkout -b feature/amazing-feature
  3. ๐Ÿ’พ Commit your changes: git commit -m 'Add amazing feature'
  4. ๐Ÿ“ค Push to branch: git push origin feature/amazing-feature
  5. ๐Ÿ”ƒ Open a Pull Request

๐Ÿ“‹ Development Guidelines

  • โœ… Follow Java coding standards and conventions
  • ๐Ÿ“ Add comprehensive unit tests for new features
  • ๐Ÿ“– Update documentation for any API changes
  • ๐ŸŽฏ Ensure all tests pass before submitting
  • ๐Ÿšจ Include integration tests for stream processing changes
  • ๐Ÿ” Use meaningful commit messages

๐Ÿ› Bug Reports

Found a bug? Please open an issue with:

  • ๐Ÿ“ Clear description of the problem
  • ๐Ÿ”„ Steps to reproduce the issue
  • ๐Ÿ’ป Your environment details (Java version, OS, etc.)
  • ๐Ÿ“Š Expected vs actual behavior
  • ๐Ÿ“‹ Any error logs or stack traces

๐Ÿ’ก Feature Requests

Have an idea? We'd love to hear it! Include:

  • ๐ŸŽฏ Clear description of the proposed feature
  • ๐Ÿค” Explanation of why it would be useful
  • ๐Ÿ“ˆ Examples of how it would be used
  • ๐Ÿ› ๏ธ Any implementation suggestions

๐Ÿงช Testing Guidelines

# Run all tests before submitting
mvn clean test

# Check code coverage
mvn jacoco:report

# Run integration tests
mvn verify

# Performance tests
mvn test -Dtest=*PerformanceTest

๐Ÿ“œ Code Review Process

  1. All submissions require code review
  2. Maintainers will review and provide feedback
  3. Address any requested changes
  4. Once approved, your PR will be merged!

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™‹โ€โ™€๏ธ Support & Questions

  • ๐Ÿ“– Documentation: Check our comprehensive docs above
  • ๐Ÿ› Issues: GitHub Issues
  • ๐Ÿ’ฌ Discussions: GitHub Discussions
  • ๐Ÿ“ง Email: For security issues, please email directly
  • ๐Ÿ“บ Demo Videos: See our DEMO.md for recording tips

๐ŸŒŸ Acknowledgments

  • โ˜• Apache Kafka - For the incredible streaming platform
  • ๐ŸŒŠ Kafka Streams - For making stream processing accessible
  • ๐Ÿ“Š Apache Commons Math - For robust statistical functions
  • ๐Ÿณ Docker - For containerization magic
  • ๐ŸŽฏ Maven - For dependency management
  • ๐Ÿงช JUnit - For comprehensive testing framework

๐Ÿš€ Ready to detect anomalies in real-time? Let's get started! ๐Ÿš€

Made with โค๏ธ by the SlipStream team

โญ Don't forget to star this repo if you found it helpful! โญ

About

โšก Real-time fraud & anomaly detection system for streaming transactions. Built with Kafka Streams + Isolation Forest ML. Low-latency processing, online learning, and scalable architecture for detecting fraud patterns in transaction data. ๐Ÿšจ๐Ÿ”

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published