This project provides a comprehensive framework for analyzing RIPE Atlas traceroute data to detect network performance and routing anomalies. It can process large datasets from local files or a ClickHouse database, establish performance baselines, and compare current data against those baselines to identify significant deviations.
The main goal of this project is to provide a powerful and flexible tool for network operators, researchers, and enthusiasts to gain insights from RIPE Atlas traceroute data. By analyzing large-scale measurements, users can:
- Monitor Network Health: Track key performance indicators (KPIs) like Round-Trip Time (RTT), path length, and packet loss over time.
- Detect Anomalies: Automatically identify significant changes in network performance or routing behavior that could indicate outages, congestion, or rerouting events.
- Understand Routing Behavior: Analyze dominant paths and common network segments to understand how traffic flows across the internet.
- Establish Baselines: Create a statistical snapshot of "normal" network behavior to use as a benchmark for future comparisons.
The analysis pipeline follows these high-level steps:
- Data Ingestion: Reads traceroute data from local JSON Lines files (plain or compressed) or streams it directly from a ClickHouse database.
- Filtering: Narrows down the dataset based on a rich set of user-defined criteria (e.g., source/destination country, IP range, probe tags).
- Parsing & Cleaning: Parses the raw data in parallel, cleans it, and structures it for analysis.
- Analysis:
- In
baselinemode, it calculates a comprehensive statistical summary of the data to define "normal" behavior. - In
analyzemode, it compares a new dataset against a previously generated baseline, using statistical tests and thresholding to find anomalies.
- In
- Reporting: Generates a detailed JSON report containing all metadata, statistics, and a list of detected anomalies.
- Visualization: Creates a variety of plots (histograms, time-series, boxplots) to help visualize the data and the detected anomalies.
- Notification: Sends a summary of the run, along with the results file and plots, to a configured Matrix chat room.
- Flexible Data Ingestion: Process traceroute data from JSON Lines files (
.json,.bz2) or stream directly from a ClickHouse database. - Two-Phase Analysis:
baselinemode: Establishes a statistical baseline of "normal" network behavior from a historical dataset.analyzemode: Compares a current dataset against a previously generated baseline to detect anomalies.
- Advanced Anomaly Detection:
- Individual Indicators: Detects changes in performance metrics (RTT, path length, jitter), success/timeout rates, and network topology (path changes, core segment changes).
- Statistical Distribution Analysis: Uses Kolmogorov-Smirnov and Anderson-Darling tests to identify subtle shifts in the RTT distribution profile.
- Composite Events: Correlates individual indicators to diagnose higher-level events like "Major Rerouting Event," "Path Instability," and "Performance Profile Shift."
- Powerful Filtering: Filter measurements by a wide range of source or destination criteria, including country, ASN, IP range, RIPE Atlas probe tags, and geographic location.
- Comprehensive Output: Generates detailed JSON reports, statistical plots (histograms, scatter plots, etc.), and a full log of the analysis run.
- Automated Notifications: Integrates with Matrix to send detailed success or failure notifications upon task completion, including attached results and plots.
-
Clone the repository:
git clone https://github.com/Gozzim/RIPE-Atlas-Traceroute-Analysis.git cd RIPE-Atlas-Traceroute-Analysis -
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate -
Install the required dependencies: The
requirements.txtfile contains all necessary Python packages. Key dependencies include:pandasandpyarrowfor data manipulation.matplotlibandseabornfor plotting.clickhouse-driverfor connecting to ClickHouse.tqdmfor progress bars.orjsonfor fast JSON processing.coloredlogsfor enhanced console logging.
pip install -r requirements.txt
To enable notifications, you need a dedicated bot account on a Matrix homeserver.
- Copy the configuration template:
cp config.ini.example config.ini
- Edit
config.iniand fill in your bot's details:[matrix_bot] homeserver = https://matrix-client.matrix.org user_id = @my-bot:matrix.org password = your_bot_password_here room_id = !yourRoomId:matrix.org
The scripts can be configured to connect to a ClickHouse database using environment variables:
export CLICKHOUSE_HOST='127.0.0.1'
export CLICKHOUSE_PORT='9000'
export CLICKHOUSE_DB='atlas'
export CLICKHOUSE_USER='default'
export CLICKHOUSE_PASSWORD='password123'Alternatively, you can provide these details as command-line arguments.
The project contains three main executable scripts: main.py, scripts/import.py, and scripts/create_schema.py.
This is the primary script for running an analysis.
First, run the script in baseline mode on a large, representative dataset to establish a base for analysis. A good baseline represents a period of "normal" or stable network behavior.
Example (from files):
# Create a baseline for ICMP traffic to the US, calculating path statistics.
python main.py --mode baseline \
--dest-country US \
--protocol ICMP \
--path-stats --analyze-core-paths \
data/archive/january-week-1/*.json.bz2This will create an output directory (by default in out/) containing:
*_results_baseline.json: The detailed statistical baseline.*_baseline_data.parquet: A Parquet file with the raw data used, for future distribution comparison.*.log: A log file.- Optional Plots.
Next, run the script in analyze mode on a new dataset, pointing it to the baseline file you just created. This will compare the new data against the established normal and report any detected anomalies.
Example (from ClickHouse):
# Analyze the last hour of data from ClickHouse against the baseline.
python main.py --mode analyze \
--baseline-file out/example/example_results_baseline.json \
--clickhouse \
--ch-where-clause "timestamp >= now() - interval '1 hour'" \
--dest-country US \
--protocol ICMP \
--path-stats --analyze-core-paths \
--plot-metrics rtt observed_pathlen \
--plot-aggregations daily hourlyThis will generate a new output directory containing:
*_results_analyze.json: A full report including any detected individual indicators and composite events.*.log: A log file.- Optional plots comparing the current data to the baseline.
This script sets up the required database, tables, and materialized views in ClickHouse. It only needs to be run once.
Example:
python scripts/create_schema.py --host 127.0.0.1 --db atlasUse this script to parse traceroute JSON files and import them into a ClickHouse database.
Prerequisite: You must first create the database schema using create_schema.py.
Example:
# Import all traceroutes from July 2025 destined for Germany or France.
python scripts/import.py \
--host your-ch-host \
--db atlas \
--workers 8 \
--optimize-final \
--dest-country DE FR \
data/archive/2025-07-*.json.bz2This command will parse all files matching the pattern, filter them, and import the data into the atlas database using 8 parallel processes.
| Argument | Description |
|---|---|
INPUT_FILENAME |
Positional argument. Paths or patterns for input JSON Lines files. Required if not using --clickhouse. |
--clickhouse |
Load data from ClickHouse instead of files. |
| Argument | Description |
|---|---|
--ch-host |
ClickHouse server host. |
--ch-port |
ClickHouse server port. |
--ch-database |
ClickHouse database name. |
--ch-user |
ClickHouse username. |
--ch-password |
ClickHouse password. |
--ch-where-clause |
Optional custom SQL WHERE clause for ClickHouse queries. |
| Argument | Description |
|---|---|
-o, --output-dir |
Directory to save all output files. Default: Auto-generated in the out/ folder. |
--log-file |
Path to log file. Default: <output_dir>/<base>.log. |
| Argument | Description |
|---|---|
-m, --mode |
Required. Operation mode: baseline to generate stats, analyze to compare against a baseline. |
-i, --baseline-file |
Path to the baseline JSON file. Required for analyze mode. |
| Argument | Description |
|---|---|
-n, --limit |
Process only the first N measurements/records. |
--probe-stats |
Enable calculation and reporting of per-probe statistics. |
--path-stats |
Enable calculation of path statistics (dominant path, unique count). |
-p, --protocol |
Filter measurements by protocols (ICMP, UDP, TCP). |
--include-private-ips |
Include measurements to/through private/special IP addresses. |
These arguments filter the data based on probe properties. Most require the RIPE Atlas API.
| Argument | Description |
|---|---|
--source-country, --dest-country |
Filter by ISO country codes. |
--source-id, --dest-id |
Filter by specific probe IDs. |
--source-ip-range, --dest-ip-range |
Filter by IP address ranges in CIDR notation. |
--source-tags, --dest-tags |
Filter by probe tags. |
--source-type, --dest-type |
Filter by probe type (probe, anchor, software). |
--source-lat-range, --dest-lat-range |
Filter by latitude range (MIN MAX). |
--source-lon-range, --dest-lon-range |
Filter by longitude range (MIN MAX). |
--source-radius, --dest-radius |
Filter within a radius: "LAT,LON:DIST_KM". |
| Argument | Description |
|---|---|
--ignore-start-hops |
Ignore the first N hops in path analysis. |
--ignore-end-hops |
Ignore the last N hops in path analysis. |
--analyze-core-paths |
Enable analysis of common core path segments. |
--core-path-min-len |
Minimum length of a segment to be a core path candidate. |
--core-path-max-len |
Maximum length of a segment to be a core path candidate. |
--core-path-min-support-abs |
Minimum absolute number of traceroutes a segment must appear in. |
--core-path-min-support-rel |
Minimum relative frequency a segment must appear in. |
--core-path-top-n |
Report statistics for the top K most frequent core path segments. |
--core-path-uninformative-threshold |
Ratio of * or PRIVATE hops allowed in a path for core analysis. |
| Argument | Description |
|---|---|
--plot-metrics |
Required for plotting. One or more metrics to plot. Choices: rtt, rtt_std, first_hop_rtt, observed_pathlen, successful_pathlen, success_rate, timeout_rate. |
--plot-aggregations |
Aggregation levels for plots. Default: none. Choices: daily, hourly, dayofweek, hourofday. |
--plot-types |
Plot types to generate. Choices: hist, scatter, rolling, boxplot, path_perf. |
--formats |
Output image formats for plots (png, pdf, svg, ...). |
--highlight-outliers |
Highlight RTT outliers on the scatter plot. |
| Argument | Description |
|---|---|
-w, --workers |
Number of worker processes for processing. |
--chunk-size |
Number of records per processing chunk. |
--no-line-count |
Disable initial line count for progress bar estimation. |
-l, --log-level |
Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). |
-q, --quiet |
Suppress console log output. |
| Argument | Description |
|---|---|
FILE_PATTERN |
Positional argument. Paths or patterns for input JSON files. |
--host, --port, --db, --user, --password |
ClickHouse connection details. |
--batch-size |
Number of rows per ClickHouse insert batch. |
-w, --workers |
Number of worker processes for parsing files. |
--db-insert-workers |
Number of worker threads for database inserts. |
--optimize-final |
Run OPTIMIZE TABLE FINAL after the import completes. |
--limit |
Process only the first N input lines in total. |
--imap-chunksize |
Chunksize for the multiprocessing pool. |
--source-*, --dest-*, --protocol |
All filtering options from main.py are also available here. |
| Argument | Description |
|---|---|
--host, --port, --user, --password |
ClickHouse connection details. |
--db |
The name of the database to create and/or apply the schema to. |
--schema |
Path to the .sql schema file to execute. |
- JSON Results File (
*_results_mode.json): A comprehensive JSON file containing all metadata, processing summaries, statistical aggregations, path analyses, and a list of detected anomalies. - Parquet Data File (
*_baseline_data.parquet): Created inbaselinemode, this file stores the processed DataFrame. It is used by theanalyzemode to perform the K-S and A-D distribution tests. - Plots: Visualizations of key metrics, saved as
.pngfiles (or other specified formats). - Log File (
.log): A detailed log of the entire run, useful for debugging. - Matrix Notifications: Real-time alerts sent to your configured chat room, summarizing the run and attaching the results file and any plots.
Problem: When running an analysis from ClickHouse with a very large number of filters (thousands of --source-id or --dest-ip values), you may encounter an error similar to this:
DB::Exception: Max query size exceeded (can be increased with the `max_query_size` setting)
This happens because the script constructs a single, very long SQL query string that exceeds the default safety limit on the ClickHouse server.
Solution: The recommended solution is to increase this limit on the ClickHouse server for the user profile you are connecting with.
Instructions:
-
Locate your ClickHouse user configuration file. This is typically
/etc/clickhouse-server/users.xmlor a file inside/etc/clickhouse-server/users.d/. -
Edit the file (e.g.,
sudo nano /etc/clickhouse-server/users.xml). -
Inside the profile of your user, add or modify the
max_query_sizesetting.<!-- Inside /etc/clickhouse-server/users.xml --> <clickhouse> <profiles> <default> <!-- ... --> <!-- Increase the max query size from default (262144=256KiB) --> <!-- This value (2097152) is 2 MiB --> <max_query_size>2097152</max_query_size> </default> </profiles> <!-- ... --> </clickhouse>
-
Save the file and restart the ClickHouse server to apply the changes:
sudo systemctl restart clickhouse-server
.
├── analyzer_lib/ # Core analysis library
│ ├── analysis/ # Main analysis logic
│ ├── common/ # Shared components (constants, utils, bot)
│ └── data_source/ # Data readers (files, ClickHouse)
├── config.ini.example # Template for Matrix bot configuration
├── data/ # Example data files
├── main.py # Main analysis script
├── out/ # Default directory for all outputs
├── requirements.txt # Project dependencies
├── schema/ # SQL schema for ClickHouse
└── scripts/ # Helper scripts for import and setup
- RIPE Atlas: This work relies on the open and extensive data provided by the RIPE Atlas global measurement network.
- Google Gemini: Significant help with analysis and code.
This project is licensed under the GNU AGPL-3.0 license.