Skip to content

Real-time analytics of Internet traffic flow data, sourced from MAWI traffic archives, with Kafka, Clickhouse and Grafana, orchestrated with Docker Swarm

Notifications You must be signed in to change notification settings

20kaushik02/real-time-traffic-analysis-clickhouse

Repository files navigation

Real-time analytics of Internet traffic flow data

Architecture diagram

Download the dataset

  • The full preprocessed dataset is hosted here - 1.4GB
  • Place this file in the preprocessing directory
  • For testing purposes, you can use the sample CSV that has 10k records from each day instead, change the bind path in the Compose file

To run the project

  • From the scripts directory:
    • Run deploy.ps1 -M for Windows
    • Run deploy.sh -M for Linux/macOS (add -S if sudo needed for docker)
    • See the README in scripts for more
  • This sets up the whole stack

Access the UI

  • The Grafana web interface is located at http://localhost:7602
  • Login:
    • Username: thewebfarm
    • Password: mrafbeweht
  • Go to Dashboards > Internet traffic capture analysis

To run the shard creation and scaling script

  • From the scripts directory:
    • Install dependencies: python3 -r ../clickhouse/update_config_scripts/requirements.txt
    • Run python3 ../clickhouse/update_config_scripts/update_trigger.py
  • This checks every 2 minutes and creates a new shard and two server nodes for it based on resource utilization

Limitations

  • For multi-node deployments using Docker Swarm, the manager node needs to be running on Linux (outside Docker Desktop i.e. standalone Docker installation) due to limitations in the Docker Swarm engine

About

Real-time analytics of Internet traffic flow data, sourced from MAWI traffic archives, with Kafka, Clickhouse and Grafana, orchestrated with Docker Swarm

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •