🔐 AI-Driven SIEM System

A comprehensive cybersecurity solution integrating Log-based anomaly detection, real-time Kafka event streaming, all visualized through an interactive dashboard.

Enables scalable, real-time threat monitoring and response by unifying log anomaly detection, severity analysis, type classification and centralized security management.

🌟 Key Features

🧠 Advanced AI-Powered Detection

Hybrid Attention LSTM Autoencoder with ensemble learning for superior anomaly detection
Multi-modal Analysis: Sequential pattern detection + individual log content analysis
Dynamic Severity Assessment with adaptive threshold learning
Real-time Processing with sub-100ms inference latency

📊 Comprehensive Log Support

Multi-format Compatibility: Linux, Apache, Windows, Syslog, and custom log formats
Intelligent Parsing: Brain Log Parser with automated structure recognition
Feature Engineering: TF-IDF vectorization with semantic analysis
Temporal Analysis: Sliding window approach for attack sequence detection

🔄 Real-Time Streaming & Integration

Apache Kafka Integration for real-time log streaming and event processing
Filebeat Collection for automated log harvesting from multiple sources
Interactive Dashboard with real-time visualization and monitoring
RESTful API for seamless integration with existing security infrastructure

🎯 Production-Ready Architecture

Scalable Design: Enterprise-level deployment capabilities
Flexible Configuration: YAML-based configuration management
Docker Support: Containerized deployment for easy scaling
Monitoring & Alerting: Built-in performance metrics and health monitoring

🏗️ System Architecture

Core Components

Component	Technology	Purpose
AI Detection Engine	PyTorch, LSTM Autoencoders	Advanced anomaly detection with ensemble learning
Log Streaming	Apache Kafka, Filebeat	Real-time log collection and event streaming
Dashboard	Next.js, React, Express.js	Interactive visualization and monitoring interface
Log Parser	Python (BRAIN)	Intelligent log structure recognition and parsing
API Gateway	Express.js, REST	Communication bridge between components

🚀 Quick Start

Prerequisites

Python 3.8+ with pip
Node.js 14+ with npm
Docker & Docker Compose (recommended for Kafka)
CUDA-compatible GPU (optional, for faster training)

1️⃣ Installation

# Clone the repository
git clone https://github.com/AK11105/AI-driven-SIEM-System.git
cd AI-driven-SIEM-System

# Install Python dependencies for the AI model
cd model
pip install -r requirements.txt

# Install dashboard dependencies
cd ../dashboard/frontend
npm install
cd ../backend
npm install

# Start Kafka infrastructure (for real-time streaming)
cd ../../logs-collection
docker-compose up -d

2️⃣ Quick Demo

# Run anomaly detection on sample Linux logs
cd model
python src/logAnomalyDetection/run.py --input data/logs/raw/Linux_test.log

# Start the dashboard (in separate terminals)
cd dashboard/backend
npm start

cd dashboard/frontend
npm run dev

3️⃣ Access the Dashboard

Open your browser and navigate to:

Frontend Dashboard: http://localhost:3000
Backend API: http://localhost:5000
Kafka UI: http://localhost:8080 (if using kafka-ui)

📖 Usage Guide

🔍 Log Anomaly Detection

Basic Usage

# Full pipeline (parsing + detection)
python src/logAnomalyDetection/run.py --input Linux_test.log

# Parse logs only
python src/logAnomalyDetection/run.py --mode parse --input Linux_test.log

# Detection only (use existing parsed data)
python src/logAnomalyDetection/run.py --mode detect --use-existing-csv data/logs/processed/Linux.log_structured.csv

Detection Modes

# Sequential anomaly detection (temporal patterns)
python src/logAnomalyDetection/run.py --processing-mode sequential --input Linux_test.log

# Individual log analysis (content-based)
python src/logAnomalyDetection/run.py --processing-mode single --input Linux_test.log

# Hybrid analysis (recommended - combines both approaches)
python src/logAnomalyDetection/run.py --processing-mode both --input Linux_test.log

Multi-Format Support

# Apache access logs
python src/logAnomalyDetection/run.py --dataset Apache --input apache_access.log

# Windows event logs
python src/logAnomalyDetection/run.py --dataset Windows --input windows_event.log

# Custom log formats with custom paths
python src/logAnomalyDetection/run.py \
  --dataset Custom \
  --input-dir /var/log/application/ \
  --output-dir /data/processed/ \
  --input app.log

🌐 Dashboard Integration

Basic Integration

# Export results to dashboard (default: localhost:5000)
python src/logAnomalyDetection/run.py --input Linux_test.log --export-to-express

# Custom dashboard URL
python src/logAnomalyDetection/run.py \
  --input Linux_test.log \
  --export-to-express \
  --express-url http://your-dashboard:3000

# Test dashboard connectivity
python src/logAnomalyDetection/run.py --test-express-connection

Production Deployment

# Production configuration with custom settings
python src/logAnomalyDetection/run.py \
  --config production.yml \
  --input-dir /var/log/security/ \
  --reports-dir /opt/siem/reports/ \
  --export-to-express \
  --express-url http://siem-dashboard:5000 \
  --verbose

📡 Real-Time Log Streaming

Kafka Setup

# Start Kafka infrastructure
cd logs-collection
docker-compose up -d

# Configure Filebeat for log collection
sudo vim /etc/filebeat/filebeat.yml
sudo systemctl start filebeat

# Test log streaming
python3 log_consumer.py

Monitor Kafka Topics

# Access Kafka container
docker exec -it <kafka_container_name> bash

# Check incoming logs
kafka-console-consumer --bootstrap-server localhost:9092 --topic system-logs --from-beginning

⚙️ Configuration

Model Configuration (`config.yml`)

# Neural Network Architecture
model:
  hidden_dims: [16, 24, 32]
  sequence_length: 8
  attention_heads: 4
  dropout_rate: 0.3
  learning_rate: 0.001

# Training Parameters
training:
  batch_size: 32
  epochs: 50
  validation_split: 0.15
  early_stopping_patience: 5

# Severity Thresholds
severity:
  low_percentile: 85
  medium_percentile: 95
  high_percentile: 99
  critical_percentile: 99.9

# Processing Options
processing:
  enable_gpu: true
  parallel_workers: 4
  cache_preprocessed: true

# Output Configuration
output:
  save_reports: true
  export_format: ["json", "csv"]
  include_visualizations: true

Kafka Configuration (`docker-compose.yml`)

version: '3.8'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.0.1
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:7.0.1
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: true

  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    ports:
      - "8080:8080"
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092

🧠 AI Model Details

Hybrid Attention LSTM Autoencoder

The core AI engine uses a sophisticated neural architecture combining multiple approaches:

Sequential Path

Bidirectional LSTM Encoder: Processes log sequences in both directions for comprehensive temporal understanding
Multi-Head Self-Attention: 4 attention heads focus on different aspects of log relationships
LSTM Decoder: Reconstructs sequences with batch normalization for stable training

Individual Log Path

Multi-Layer Perceptron: Analyzes individual log content without temporal context
Progressive Dimensionality Reduction: Efficiently compresses log features
Content-Based Feature Extraction: Semantic analysis of log messages

Fusion Layer

Learned Combination: Automatically balances sequential and individual analysis
Consistency Loss: Ensures both pathways produce compatible representations
Ensemble Strategy: Multiple model configurations for robust detection

Detection Capabilities

Anomaly Type	Examples	Severity Levels
Memory Errors	Out of memory, memory leaks	Low → Critical
Authentication	Failed logins, privilege escalation	Medium → Critical
Filesystem	Disk full, permission errors	Low → High
Network	Connection timeouts, DNS failures	Low → High
System Critical	Kernel panics, service crashes	High → Critical

📊 Performance Metrics

Detection Performance

Precision: 94.2% (validated on Linux system logs)
Recall: 91.8% (comprehensive anomaly coverage)
F1-Score: 93.0% (balanced performance)
False Positive Rate: <5% (production-ready accuracy)

Computational Performance

Training Time: ~30 minutes (GPU), ~2 hours (CPU)
Inference Latency: <100ms per log batch
Memory Usage: 4GB RAM (typical dataset)
Throughput: 1000+ logs/second (real-time processing)

Scalability

Horizontal Scaling: Kafka partitioning for distributed processing
Vertical Scaling: GPU acceleration for intensive workloads
Storage: Configurable retention policies for log management
Monitoring: Built-in performance metrics and health checks

🐳 Docker Deployment

Complete System Deployment

# Build all services
docker-compose up --build

# Scale specific services
docker-compose up --scale ai-model=2 --scale kafka=3

# Production deployment
docker-compose -f docker-compose.prod.yml up -d

Individual Service Containers

# AI Model Container
FROM python:3.9-slim
WORKDIR /app
COPY model/ .
RUN pip install -r requirements.txt
CMD ["python", "src/logAnomalyDetection/run.py", "--config", "production.yml"]

# Dashboard Container
FROM node:16-alpine
WORKDIR /app
COPY dashboard/ .
RUN npm install && npm run build
CMD ["npm", "start"]

🔧 API Reference

Core Detection API

from model import HybridEnsembleDetector

# Initialize detector
detector = HybridEnsembleDetector(config="config.yml")

# Train on historical data
detector.train('data/logs/processed/Linux.csv')

# Real-time detection
results = detector.predict('new_logs.csv', mode='hybrid')

# Save deployment package
detector.save_deployment_package('production_model.pkl')

REST API Endpoints

Endpoint	Method	Description	Example
`/api/detect`	POST	Submit logs for analysis	`{"logs": ["log1", "log2"]}`
`/api/status`	GET	System health check	`{"status": "healthy"}`
`/api/metrics`	GET	Performance metrics	`{"accuracy": 0.942}`
`/api/config`	GET/PUT	Configuration management	Configuration JSON

Dashboard API Integration

// Real-time anomaly updates
const ws = new WebSocket('ws://localhost:5000/ws');
ws.onmessage = (event) => {
  const anomaly = JSON.parse(event.data);
  updateDashboard(anomaly);
};

// Fetch historical data
const response = await fetch('/api/anomalies?timerange=24h');
const data = await response.json();

🔍 Troubleshooting

Common Issues

🚨 Model Loading Errors

# Verify PyTorch installation
python -c "import torch; print(torch.__version__)"

# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"

# Verify model file permissions
ls -la model.pkl

🌐 Dashboard Connection Issues

# Test API connectivity
curl -X GET http://localhost:5000/health

# Check service status
docker-compose ps

# View service logs
docker-compose logs dashboard

⚡ Performance Optimization

# Enable GPU acceleration
export CUDA_VISIBLE_DEVICES=0

# Increase batch size for better throughput
python src/logAnomalyDetection/run.py --config high-performance.yml

# Monitor system resources
nvidia-smi -l 1  # GPU usage
htop             # CPU/Memory usage

📡 Kafka Issues

# Check Kafka cluster status
docker exec kafka kafka-topics --bootstrap-server localhost:9092 --list

# Monitor consumer lag
docker exec kafka kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group log-consumers

# Reset consumer group (if needed)
docker exec kafka kafka-consumer-groups --bootstrap-server localhost:9092 --group log-consumers --reset-offsets --to-earliest --all-topics --execute

Log Analysis and Debugging

# Enable verbose logging
python src/logAnomalyDetection/run.py --input logs.txt --verbose

# Check system logs
tail -f /var/log/ai-siem-system.log

# Monitor real-time performance
watch -n 1 'docker stats --no-stream'

🔮 Advanced Features

Custom Model Training

# Train with custom hyperparameters
detector = HybridEnsembleDetector({
    'hidden_dims': [32, 48, 64],
    'learning_rate': 0.0005,
    'ensemble_size': 5
})

# Implement custom loss functions
def custom_anomaly_loss(reconstruction, original, severity_weights):
    base_loss = F.mse_loss(reconstruction, original)
    weighted_loss = base_loss * severity_weights
    return weighted_loss.mean()

Real-Time Stream Processing

# Kafka consumer for real-time processing
from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    'system-logs',
    bootstrap_servers=['localhost:9092'],
    value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)

for message in consumer:
    log_data = message.value
    anomaly_score = detector.predict_single(log_data)
    if anomaly_score > threshold:
        send_alert(log_data, anomaly_score)

🤝 Contributing

Development Setup

# Fork and clone the repository
git clone https://github.com/your-username/AI-driven-SIEM-System.git
cd AI-driven-SIEM-System

# Create development environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
pytest tests/ --cov=src/

# Code formatting
black src/
flake8 src/

Contributing Guidelines

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🏆 Acknowledgments

PyTorch Team for the deep learning framework
Apache Software Foundation for Kafka streaming platform
BRAIN for log parsing inspiration
Security Research Community for datasets and benchmarks

📞 Support & Contact

🐛 Bug Reports & Feature Requests

GitHub Issues: Create an Issue
Security Vulnerabilities: Please email directly to maintainers

💬 Community & Discussions

GitHub Discussions: Join the Conversation
Documentation: Detailed Guides

👥 Maintainers

Atharva Kulkarni - @AK11105
Darshan Atkari - @atkaridarshan04

⭐ If you find this project useful, please consider giving it a star! ⭐

Built with ❤️ for cybersecurity professionals and researchers

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
dashboard		dashboard
logs-collection		logs-collection
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

atkaridarshan04/AI-driven-SIEM-System

Folders and files

Latest commit

History

Repository files navigation