A comprehensive cybersecurity solution integrating Log-based anomaly detection, real-time Kafka event streaming, all visualized through an interactive dashboard.
Enables scalable, real-time threat monitoring and response by unifying log anomaly detection, severity analysis, type classification and centralized security management.
- Hybrid Attention LSTM Autoencoder with ensemble learning for superior anomaly detection
- Multi-modal Analysis: Sequential pattern detection + individual log content analysis
- Dynamic Severity Assessment with adaptive threshold learning
- Real-time Processing with sub-100ms inference latency
- Multi-format Compatibility: Linux, Apache, Windows, Syslog, and custom log formats
- Intelligent Parsing: Brain Log Parser with automated structure recognition
- Feature Engineering: TF-IDF vectorization with semantic analysis
- Temporal Analysis: Sliding window approach for attack sequence detection
- Apache Kafka Integration for real-time log streaming and event processing
- Filebeat Collection for automated log harvesting from multiple sources
- Interactive Dashboard with real-time visualization and monitoring
- RESTful API for seamless integration with existing security infrastructure
- Scalable Design: Enterprise-level deployment capabilities
- Flexible Configuration: YAML-based configuration management
- Docker Support: Containerized deployment for easy scaling
- Monitoring & Alerting: Built-in performance metrics and health monitoring
Component | Technology | Purpose |
---|---|---|
AI Detection Engine | PyTorch, LSTM Autoencoders | Advanced anomaly detection with ensemble learning |
Log Streaming | Apache Kafka, Filebeat | Real-time log collection and event streaming |
Dashboard | Next.js, React, Express.js | Interactive visualization and monitoring interface |
Log Parser | Python (BRAIN) | Intelligent log structure recognition and parsing |
API Gateway | Express.js, REST | Communication bridge between components |
- Python 3.8+ with pip
- Node.js 14+ with npm
- Docker & Docker Compose (recommended for Kafka)
- CUDA-compatible GPU (optional, for faster training)
# Clone the repository
git clone https://github.com/AK11105/AI-driven-SIEM-System.git
cd AI-driven-SIEM-System
# Install Python dependencies for the AI model
cd model
pip install -r requirements.txt
# Install dashboard dependencies
cd ../dashboard/frontend
npm install
cd ../backend
npm install
# Start Kafka infrastructure (for real-time streaming)
cd ../../logs-collection
docker-compose up -d
# Run anomaly detection on sample Linux logs
cd model
python src/logAnomalyDetection/run.py --input data/logs/raw/Linux_test.log
# Start the dashboard (in separate terminals)
cd dashboard/backend
npm start
cd dashboard/frontend
npm run dev
Open your browser and navigate to:
- Frontend Dashboard:
http://localhost:3000
- Backend API:
http://localhost:5000
- Kafka UI:
http://localhost:8080
(if using kafka-ui)
# Full pipeline (parsing + detection)
python src/logAnomalyDetection/run.py --input Linux_test.log
# Parse logs only
python src/logAnomalyDetection/run.py --mode parse --input Linux_test.log
# Detection only (use existing parsed data)
python src/logAnomalyDetection/run.py --mode detect --use-existing-csv data/logs/processed/Linux.log_structured.csv
# Sequential anomaly detection (temporal patterns)
python src/logAnomalyDetection/run.py --processing-mode sequential --input Linux_test.log
# Individual log analysis (content-based)
python src/logAnomalyDetection/run.py --processing-mode single --input Linux_test.log
# Hybrid analysis (recommended - combines both approaches)
python src/logAnomalyDetection/run.py --processing-mode both --input Linux_test.log
# Apache access logs
python src/logAnomalyDetection/run.py --dataset Apache --input apache_access.log
# Windows event logs
python src/logAnomalyDetection/run.py --dataset Windows --input windows_event.log
# Custom log formats with custom paths
python src/logAnomalyDetection/run.py \
--dataset Custom \
--input-dir /var/log/application/ \
--output-dir /data/processed/ \
--input app.log
# Export results to dashboard (default: localhost:5000)
python src/logAnomalyDetection/run.py --input Linux_test.log --export-to-express
# Custom dashboard URL
python src/logAnomalyDetection/run.py \
--input Linux_test.log \
--export-to-express \
--express-url http://your-dashboard:3000
# Test dashboard connectivity
python src/logAnomalyDetection/run.py --test-express-connection
# Production configuration with custom settings
python src/logAnomalyDetection/run.py \
--config production.yml \
--input-dir /var/log/security/ \
--reports-dir /opt/siem/reports/ \
--export-to-express \
--express-url http://siem-dashboard:5000 \
--verbose
# Start Kafka infrastructure
cd logs-collection
docker-compose up -d
# Configure Filebeat for log collection
sudo vim /etc/filebeat/filebeat.yml
sudo systemctl start filebeat
# Test log streaming
python3 log_consumer.py
# Access Kafka container
docker exec -it <kafka_container_name> bash
# Check incoming logs
kafka-console-consumer --bootstrap-server localhost:9092 --topic system-logs --from-beginning
# Neural Network Architecture
model:
hidden_dims: [16, 24, 32]
sequence_length: 8
attention_heads: 4
dropout_rate: 0.3
learning_rate: 0.001
# Training Parameters
training:
batch_size: 32
epochs: 50
validation_split: 0.15
early_stopping_patience: 5
# Severity Thresholds
severity:
low_percentile: 85
medium_percentile: 95
high_percentile: 99
critical_percentile: 99.9
# Processing Options
processing:
enable_gpu: true
parallel_workers: 4
cache_preprocessed: true
# Output Configuration
output:
save_reports: true
export_format: ["json", "csv"]
include_visualizations: true
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.0.1
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:7.0.1
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_AUTO_CREATE_TOPICS_ENABLE: true
kafka-ui:
image: provectuslabs/kafka-ui:latest
ports:
- "8080:8080"
environment:
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092
The core AI engine uses a sophisticated neural architecture combining multiple approaches:
- Bidirectional LSTM Encoder: Processes log sequences in both directions for comprehensive temporal understanding
- Multi-Head Self-Attention: 4 attention heads focus on different aspects of log relationships
- LSTM Decoder: Reconstructs sequences with batch normalization for stable training
- Multi-Layer Perceptron: Analyzes individual log content without temporal context
- Progressive Dimensionality Reduction: Efficiently compresses log features
- Content-Based Feature Extraction: Semantic analysis of log messages
- Learned Combination: Automatically balances sequential and individual analysis
- Consistency Loss: Ensures both pathways produce compatible representations
- Ensemble Strategy: Multiple model configurations for robust detection
Anomaly Type | Examples | Severity Levels |
---|---|---|
Memory Errors | Out of memory, memory leaks | Low โ Critical |
Authentication | Failed logins, privilege escalation | Medium โ Critical |
Filesystem | Disk full, permission errors | Low โ High |
Network | Connection timeouts, DNS failures | Low โ High |
System Critical | Kernel panics, service crashes | High โ Critical |
- Precision: 94.2% (validated on Linux system logs)
- Recall: 91.8% (comprehensive anomaly coverage)
- F1-Score: 93.0% (balanced performance)
- False Positive Rate: <5% (production-ready accuracy)
- Training Time: ~30 minutes (GPU), ~2 hours (CPU)
- Inference Latency: <100ms per log batch
- Memory Usage: 4GB RAM (typical dataset)
- Throughput: 1000+ logs/second (real-time processing)
- Horizontal Scaling: Kafka partitioning for distributed processing
- Vertical Scaling: GPU acceleration for intensive workloads
- Storage: Configurable retention policies for log management
- Monitoring: Built-in performance metrics and health checks
# Build all services
docker-compose up --build
# Scale specific services
docker-compose up --scale ai-model=2 --scale kafka=3
# Production deployment
docker-compose -f docker-compose.prod.yml up -d
# AI Model Container
FROM python:3.9-slim
WORKDIR /app
COPY model/ .
RUN pip install -r requirements.txt
CMD ["python", "src/logAnomalyDetection/run.py", "--config", "production.yml"]
# Dashboard Container
FROM node:16-alpine
WORKDIR /app
COPY dashboard/ .
RUN npm install && npm run build
CMD ["npm", "start"]
from model import HybridEnsembleDetector
# Initialize detector
detector = HybridEnsembleDetector(config="config.yml")
# Train on historical data
detector.train('data/logs/processed/Linux.csv')
# Real-time detection
results = detector.predict('new_logs.csv', mode='hybrid')
# Save deployment package
detector.save_deployment_package('production_model.pkl')
Endpoint | Method | Description | Example |
---|---|---|---|
/api/detect |
POST | Submit logs for analysis | {"logs": ["log1", "log2"]} |
/api/status |
GET | System health check | {"status": "healthy"} |
/api/metrics |
GET | Performance metrics | {"accuracy": 0.942} |
/api/config |
GET/PUT | Configuration management | Configuration JSON |
// Real-time anomaly updates
const ws = new WebSocket('ws://localhost:5000/ws');
ws.onmessage = (event) => {
const anomaly = JSON.parse(event.data);
updateDashboard(anomaly);
};
// Fetch historical data
const response = await fetch('/api/anomalies?timerange=24h');
const data = await response.json();
# Verify PyTorch installation
python -c "import torch; print(torch.__version__)"
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# Verify model file permissions
ls -la model.pkl
# Test API connectivity
curl -X GET http://localhost:5000/health
# Check service status
docker-compose ps
# View service logs
docker-compose logs dashboard
# Enable GPU acceleration
export CUDA_VISIBLE_DEVICES=0
# Increase batch size for better throughput
python src/logAnomalyDetection/run.py --config high-performance.yml
# Monitor system resources
nvidia-smi -l 1 # GPU usage
htop # CPU/Memory usage
# Check Kafka cluster status
docker exec kafka kafka-topics --bootstrap-server localhost:9092 --list
# Monitor consumer lag
docker exec kafka kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group log-consumers
# Reset consumer group (if needed)
docker exec kafka kafka-consumer-groups --bootstrap-server localhost:9092 --group log-consumers --reset-offsets --to-earliest --all-topics --execute
# Enable verbose logging
python src/logAnomalyDetection/run.py --input logs.txt --verbose
# Check system logs
tail -f /var/log/ai-siem-system.log
# Monitor real-time performance
watch -n 1 'docker stats --no-stream'
# Train with custom hyperparameters
detector = HybridEnsembleDetector({
'hidden_dims': [32, 48, 64],
'learning_rate': 0.0005,
'ensemble_size': 5
})
# Implement custom loss functions
def custom_anomaly_loss(reconstruction, original, severity_weights):
base_loss = F.mse_loss(reconstruction, original)
weighted_loss = base_loss * severity_weights
return weighted_loss.mean()
# Kafka consumer for real-time processing
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer(
'system-logs',
bootstrap_servers=['localhost:9092'],
value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)
for message in consumer:
log_data = message.value
anomaly_score = detector.predict_single(log_data)
if anomaly_score > threshold:
send_alert(log_data, anomaly_score)
# Fork and clone the repository
git clone https://github.com/your-username/AI-driven-SIEM-System.git
cd AI-driven-SIEM-System
# Create development environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
pytest tests/ --cov=src/
# Code formatting
black src/
flake8 src/
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- PyTorch Team for the deep learning framework
- Apache Software Foundation for Kafka streaming platform
- BRAIN for log parsing inspiration
- Security Research Community for datasets and benchmarks
- GitHub Issues: Create an Issue
- Security Vulnerabilities: Please email directly to maintainers
- GitHub Discussions: Join the Conversation
- Documentation: Detailed Guides
- Atharva Kulkarni - @AK11105
- Darshan Atkari - @atkaridarshan04
โญ If you find this project useful, please consider giving it a star! โญ
Built with โค๏ธ for cybersecurity professionals and researchers