A high-performance simulation tool for modeling epidemic spread on complex networks using the SIR (Susceptible-Infected-Recovered) model.
SPKMC implements the Shortest Path Kinetic Monte Carlo algorithm, a method for simulating how diseases spread through networks of connected individuals based on the framework introduced by Tolić, Kleineberg & Antulov-Fantulin (2018). Instead of simulating each infection event one at a time (which can be slow), SPKMC uses graph theory to compute when each person in the network will become infected, based on the shortest weighted paths from initially infected individuals.
This approach maps SIR dynamics to edge weights representing propagation times, then uses Dijkstra's algorithm to efficiently determine infection arrival times. This makes SPKMC significantly faster than traditional Monte Carlo methods, especially for large networks with thousands or millions of nodes.
In the SIR model, each individual (node) in the network can be in one of three states:
- Susceptible (S): Healthy individuals who can become infected
- Infected (I): Currently infected individuals who can spread the disease
- Recovered (R): Individuals who have recovered and are now immune
The simulation tracks how the proportions of S, I, and R change over time as the epidemic progresses through the network.
- Experiment-driven workflow: Define multi-scenario experiments in simple JSON files, run them with one command, and automatically generate comparison plots
- Multiple network types: Simulate epidemics on different network structures (random, scale-free, regular)
- Flexible timing distributions: Model realistic recovery and infection times using Gamma or Exponential distributions
- High performance: Uses Numba JIT compilation for speed, with optional GPU acceleration
- Publication-quality plots: Generate professional visualizations of epidemic dynamics
- Multiple export formats: Save results as JSON, CSV, Excel, Markdown, or HTML
pip install spkmcThat's it. This installs SPKMC and all required dependencies.
SPKMC automatically detects if you have an NVIDIA GPU. If GPU hardware is found but GPU packages aren't installed, you'll see a suggestion to enable acceleration:
NVIDIA GPU detected but GPU acceleration packages are not installed.
For significantly faster simulations on large networks (10,000+ nodes),
install GPU support with:
pip install spkmc[gpu]
To install GPU support manually:
pip install spkmc[gpu]This requires an NVIDIA GPU with CUDA drivers installed.
After installation, the spkmc command will be available in your terminal.
The most powerful way to use SPKMC is through experiments. An experiment is a JSON configuration file that defines multiple scenarios (simulations with different parameters) to run and compare automatically.
Why experiments?
- Reproducibility: Configuration is saved as JSON documenting exactly what you ran
- Automation: Run dozens of scenarios with one command
- Comparison: Automatic generation of comparison plots
- Organization: Results saved in
data/<experiment_name>/directories
Option 1: Interactive Wizard
spkmc experimentsSelect "[+] Create New Experiment" from the menu. The wizard guides you through:
- Experiment name and description
- Base parameters (network type, distribution, nodes, etc.)
- Which parameter to vary across scenarios
- The values for that parameter
The wizard saves the configuration to experiments/<name>/data.json and can run it immediately.
Option 2: Manual Configuration
Create experiments/my_experiment/data.json:
{
"name": "My Experiment",
"description": "How does infection rate affect epidemic size?",
"parameters": {
"network": "er",
"distribution": "gamma",
"nodes": 1000,
"k_avg": 10,
"lambda": 0.5,
"samples": 100,
"num_runs": 3
},
"scenarios": [
{ "label": "Baseline" },
{ "label": "High Infection", "lambda": 0.8 },
{ "label": "Low Infection", "lambda": 0.3 }
]
}The parameters field defines defaults; each scenario only specifies what differs.
# Interactive menu to select and run
spkmc experiments
# Run all experiments at in sequence
spkmc experiments --all
# Re-run from scratch (clears previous results)
spkmc experiments --overrideResults are saved to data/<experiment_name>/ with one JSON file per scenario plus a comparison.png plot.
| Option | Description |
|---|---|
-a, --all |
Run all experiments (no menu) |
--override |
Clear results and re-run |
--no-plot |
Disable plot generation |
-x, --export |
Output format: json, csv, excel, md, html (default: json) |
--debug |
Show detailed debug info |
--clear-cache |
Clear Numba compilation cache |
Results are always saved to data/experiments/<experiment_name>/.
For quick tests or one-off simulations:
spkmc run -n er -d gamma --nodes 1000 --samples 50This creates an Erdos-Renyi network with 1000 nodes, uses Gamma-distributed recovery times, runs 50 samples, and displays a plot. Use -o results.json to save results.
The network structure dramatically affects how epidemics spread. SPKMC supports four types:
Erdos-Renyi (er) - The simplest random network model. Each pair of nodes has an equal probability of being connected. Good for modeling well-mixed populations where everyone has roughly the same number of contacts.
Scale-Free Network (sf) - A scale-free network where some nodes (hubs) have many more connections than others, following a power-law distribution. This better represents real social networks where some people are much more connected than others.
Complete Graph (cg) - Every node is connected to every other node. Useful as a theoretical baseline but not realistic for most applications.
Random Regular (rrn) - Every node has exactly the same number of connections. Useful for studying how epidemics spread when everyone has equal contact rates.
The timing of infection and recovery events follows probability distributions:
Gamma Distribution - Recovery times follow a Gamma distribution controlled by shape and scale parameters. When shape > 1, there's a characteristic delay before most recoveries occur, which is realistic for many diseases. Infection times use the lambda parameter.
Exponential Distribution - Recovery times follow an Exponential distribution controlled by the mu parameter. Recovery events are "memoryless" - the probability of recovering doesn't depend on how long you've been infected. Infection times use the lambda parameter.
The following parameters apply to both spkmc run and batch experiment scenarios.
| Parameter | Description | Default | Applies to |
|---|---|---|---|
--nodes, -N |
Number of individuals in the network. Larger networks are more accurate but slower to simulate. | 1000 | All networks |
--k-avg |
Average number of connections per node. Higher values mean faster epidemic spread. | 10 | er, sf, rrn |
--exponent |
Power-law exponent γ for degree distribution. Lower values (e.g., 2.1) create more hub nodes; higher values (e.g., 3.0) are more uniform. Valid range: > 2.0. | 2.5 | sf only |
For Gamma distribution (-d gamma):
| Parameter | Description | Default |
|---|---|---|
--shape |
Shape parameter (k) of the Gamma distribution. Controls the "peakedness" of recovery times. | 2.0 |
--scale |
Scale parameter (θ) of the Gamma distribution. Mean recovery time = shape × scale. | 1.0 |
--lambda |
Infection rate (β). Higher values mean faster transmission along edges. | 1.0 |
For Exponential distribution (-d exponential):
| Parameter | Description | Default |
|---|---|---|
--mu |
Recovery rate (γ). Mean recovery time = 1/mu. | 1.0 |
--lambda |
Infection rate (β). Higher values mean faster transmission along edges. | 1.0 |
| Parameter | Description | Default |
|---|---|---|
--samples, -s |
Number of Monte Carlo samples per run. More samples = smoother curves and better statistics. | 50 |
--num-runs, -r |
Number of independent runs to average. Provides error estimates when > 1. | 2 |
--initial-perc, -i |
Fraction of population initially infected (0.0 to 1.0). | 0.01 |
--t-max |
Maximum simulation time in arbitrary units. | 10.0 |
--steps |
Number of time points to record in output. | 100 |
The run command executes a single simulation with the parameters you specify.
Basic usage:
spkmc run -n <network_type> -d <distribution> [options]Common examples:
# Simple simulation with default parameters
spkmc run -n er -d gamma
# Larger network with more samples for publication-quality results
spkmc run -n er -d gamma --nodes 10000 --samples 200 --num-runs 5
# Scale-free network with specific power-law exponent
spkmc run -n sf -d gamma --nodes 5000 --exponent 2.5 --k-avg 8
# Using exponential distribution for recovery times
spkmc run -n er -d exponential --mu 0.5 --lambda 0.8
# Save results to a JSON file (default format)
spkmc run -n er -d gamma -o my_results.json
# Save results as CSV instead of JSON
spkmc run -n er -d gamma -o my_results --export csv
# Run without displaying the plot (useful for batch processing or servers)
spkmc run -n er -d gamma -o results.json --no-plotNote: Results are only saved when you specify -o, --output. Without it, the simulation runs and displays a plot but nothing is saved to disk.
All options for spkmc run:
| Option | Default | Description |
|---|---|---|
-n, --network-type |
er |
Network type: er, sf, cg, or rrn |
-d, --dist-type |
gamma |
Distribution: gamma or exponential |
-N, --nodes |
1000 |
Number of nodes in the network |
--k-avg |
10 |
Average degree (connections per node) |
--exponent |
2.5 |
Power-law exponent for scale-free networks |
--shape |
2.0 |
Gamma distribution shape parameter |
--scale |
1.0 |
Gamma distribution scale parameter |
--mu |
1.0 |
Exponential distribution rate parameter |
--lambda |
1.0 |
Infection transmission rate |
-s, --samples |
50 |
Number of Monte Carlo samples |
-r, --num-runs |
2 |
Number of independent runs (for error bars) |
-i, --initial-perc |
0.01 |
Initial fraction infected (0.01 = 1%) |
--t-max |
10.0 |
Maximum simulation time |
--steps |
100 |
Number of time points to record |
-o, --output |
None | Path to save results (required to save anything) |
-e, --export |
json |
Output format: json, csv, excel, md, html |
--no-plot |
False | Don't display the plot |
--override |
False | Overwrite existing output file |
The plot command creates visualizations from saved result files. It handles both single files and comparisons of multiple results.
Supported file formats: JSON (.json), CSV (.csv), Excel (.xlsx, .xls)
Plotting a single result:
spkmc plot results.json
spkmc plot results.csv # Also works with CSV exports
spkmc plot results.xlsx # Also works with Excel exportsComparing multiple results:
Pass multiple files or directories to create a comparison plot:
# Compare specific files
spkmc plot result1.json result2.json
# Compare all scenarios from an experiment
spkmc plot data/average_degree_effect/
# Compare scenarios from multiple experiments
spkmc plot data/experiment1/ data/experiment2/
# Add custom labels
spkmc plot data/exp/k_avg_4.json data/exp/k_avg_10.json \
-l "Low connectivity" -l "High connectivity"Customizing the plot:
# Show only the infected curve
spkmc plot results.json -s I
# Show infected and recovered curves with error bars
spkmc plot results.json -s I -s R --with-error
# Save as a high-resolution PDF for publication
spkmc plot results.json -o figure.pdf --dpi 600
# Plot each scenario separately instead of comparing
spkmc plot data/my_experiment/ --separateAll options for spkmc plot:
| Option | Description |
|---|---|
-e, --with-error |
Display error bars (requires multiple runs) |
-o, --output |
Save plot to file instead of displaying |
-f, --format |
Output format: png, pdf, svg, jpg |
--dpi |
Image resolution (default: 300) |
-s, --states |
Which states to plot (can use multiple times) |
--separate |
Create separate plots instead of comparison |
-l, --labels |
Custom labels for comparison (use multiple times) |
-x, --export |
Export the underlying data |
The info command helps you explore saved results without loading them into Python.
List all available result files:
spkmc info --listShow details of a specific result:
spkmc info -f results.jsonThis displays the simulation parameters, network configuration, and summary statistics.
Export result information:
spkmc info -f results.json --export md -o summary.mdAll options for spkmc info:
| Option | Description |
|---|---|
-f, --result-file |
Path to a specific result file to inspect |
-l, --list |
List all available result files |
-e, --export |
Export format: json, csv, excel, md, html |
-o, --output |
Path to save the exported file |
SPKMC can automatically generate academic-style analysis reports for your simulation results using OpenAI's language models. This feature helps interpret simulation results by providing structured scientific analysis including introduction, results interpretation, discussion, and conclusions.
-
OpenAI API Key: Set it as an environment variable:
export OPENAI_API_KEY="sk-your-api-key-here"
You can also add this to your shell profile (
~/.bashrc,~/.zshrc, etc.) to make it permanent. -
OpenAI Package: The
openaiPython package must be installed:pip install openai
When you run analysis, SPKMC:
- Loads the simulation results and metadata
- Sends the data to OpenAI's
gpt-4o-minimodel along with experiment context - Generates a structured analysis in Markdown format
- Saves the analysis as
analysis.mdin the results directory
Add the global --analyze flag to automatically generate analysis after running simulations:
# Run simulation and analyze results automatically
spkmc --analyze run -n er -d gamma -o result.json
# Run experiments and analyze results automatically
spkmc --analyze experiments
# Run all experiments with analysis
spkmc --analyze experiments --allAnalyze a single result file:
spkmc analyze result.jsonAnalyze multiple result files:
spkmc analyze result1.json result2.json result3.jsonAnalyze results in a directory:
spkmc analyze experiments/network_comparison/Analyze all experiments at once:
spkmc analyze --allForce regeneration (even if analysis already exists):
spkmc analyze experiments/my_experiment/ --forceUse a different model:
spkmc analyze data/results/ --model gpt-4oCustom output path (single file/directory only):
spkmc analyze data/results/ -o my_analysis.mdThe AI analysis is saved as analysis.md in the same directory as your results:
data/
├── my_experiment/
│ ├── scenario_1.json
│ ├── scenario_2.json
│ ├── comparison.png
│ └── analysis.md ← AI-generated analysis
The generated analysis.md file follows an academic paper structure:
- Introduction: Context about the simulation parameters and what was being tested
- Results: Quantitative summary of the simulation outcomes (peak infection rates, final epidemic sizes, timing)
- Discussion: Interpretation of the results and their implications
- Conclusion: Key takeaways and potential next steps
- Experiment description matters: The AI uses your experiment's
descriptionfield to understand what hypothesis you're testing. More detailed descriptions produce better analysis. - Skips existing analysis: If
analysis.mdalready exists, SPKMC won't regenerate it. Use--forceto regenerate. - Requires results: Analysis only runs after successful simulation completion.
- API costs: Each analysis uses the OpenAI API. The
gpt-4o-minimodel is cost-effective but check OpenAI's pricing for current rates.
| Option | Description |
|---|---|
PATHS... |
One or more files or directories to analyze |
-a, --all |
Analyze all experiments |
-m, --model |
OpenAI model to use (default: gpt-4o-mini) |
-f, --force |
Regenerate analysis even if it exists |
-o, --output |
Custom output path (only for single path) |
Global --analyze flag:
| Command | Description |
|---|---|
spkmc --analyze run ... |
Run simulation and generate analysis |
spkmc --analyze experiments |
Run experiments and generate analysis |
The clean command removes result files and optionally clears the Numba compilation cache.
Clean results for a specific experiment:
spkmc clean network_type_comparisonClean all results (with confirmation):
spkmc cleanYou'll be asked to confirm before anything is deleted.
Clean without confirmation:
spkmc clean -yAlso clear Numba's compilation cache:
spkmc clean --numba-cacheThis is useful if you're experiencing strange behavior after updating SPKMC.
All options for spkmc clean:
| Option | Description |
|---|---|
-y, --yes |
Skip confirmation prompt |
--numba-cache |
Also clear the Numba compilation cache |
Results are always stored in data/experiments/<experiment_name>/.
This section provides detailed documentation for the data.json configuration file format.
{
"name": "Network Type Comparison",
"description": "How does network structure affect epidemic dynamics?",
"parameters": {
"distribution": "gamma",
"nodes": 10000,
"k_avg": 10,
"shape": 2.0,
"scale": 0.5,
"lambda": 0.5,
"samples": 100,
"num_runs": 5,
"t_max": 20,
"steps": 200,
"initial_perc": 0.01
},
"plot": {
"title": "Epidemic Spread Across Network Types",
"xlabel": "Time",
"ylabel": "Proportion of Population",
"states_to_plot": ["I", "R"],
"figsize": [12, 8],
"dpi": 300,
"grid": true
},
"scenarios": [
{
"label": "Random Network",
"network": "er"
},
{
"label": "Scale-Free Network",
"network": "sf",
"exponent": 2.5
}
]
}| Field | Required | Description |
|---|---|---|
name |
Yes | Human-readable name displayed in the experiment menu |
description |
No | Brief description of what the experiment tests |
parameters |
No | Default parameters inherited by all scenarios |
plot |
No | Plot configuration (see below) |
scenarios |
Yes | Array of scenario configurations |
| Field | Default | Description |
|---|---|---|
title |
Auto-generated | Title shown on the comparison plot |
xlabel |
"Time" |
X-axis label |
ylabel |
"Proportion of Population" |
Y-axis label |
states_to_plot |
["S", "I", "R"] |
Which curves to show |
figsize |
[10, 6] |
Plot dimensions in inches [width, height] |
dpi |
300 |
Resolution for saved images |
grid |
true |
Whether to show grid lines |
Parameters not specified in a scenario inherit from the top-level parameters field.
| Parameter | Default | Description |
|---|---|---|
label |
scenario_001 |
Name shown in plot legend |
network |
er |
er, sf, cg, or rrn |
distribution |
gamma |
gamma or exponential |
network_size |
1000 |
Number of nodes |
k_avg |
10 |
Average connections per node |
exponent |
2.5 |
Power-law exponent (for sf only) |
shape |
2.0 |
Gamma distribution shape |
scale |
1.0 |
Gamma distribution scale |
mu |
1.0 |
Exponential distribution rate |
lambda |
1.0 |
Infection rate |
samples |
50 |
Monte Carlo samples per run |
num_runs |
2 |
Independent runs (for error bars) |
t_max |
10.0 |
Simulation duration |
steps |
100 |
Number of time points |
initial_perc |
0.01 |
Initial infected fraction |
Focused comparisons: Good experiments vary only one parameter at a time. This makes it clear what's causing differences in the results.
Recommended sample sizes:
- Quick tests:
samples: 50,num_runs: 2 - Standard analysis:
samples: 100,num_runs: 5 - Publication quality:
samples: 200+,num_runs: 10
While the CLI is convenient for quick analyses, you can also use SPKMC directly from Python for more control:
from spkmc import SPKMC, GammaDistribution, NetworkFactory
import numpy as np
# Step 1: Create a probability distribution for recovery times
# shape=2.0 means there's a characteristic delay before recovery
# scale=1.0 controls the time scale
# lmbd=0.5 is the infection rate
distribution = GammaDistribution(shape=2.0, scale=1.0, lmbd=0.5)
# Step 2: Create the simulator
simulator = SPKMC(distribution)
# Step 3: Generate a network
# N=1000 nodes, average of 10 connections per node
network = NetworkFactory.create_erdos_renyi(N=1000, k_avg=10)
# Step 4: Define the time points where we want measurements
time_points = np.linspace(0, 20.0, 200) # 200 points from t=0 to t=20
# Step 5: Run the simulation
# sources: which nodes are initially infected (node 0 in this case)
# samples: how many independent runs to average over
S, I, R = simulator.run_multiple_simulations(
network,
sources=np.array([0]),
time_steps=time_points,
samples=100
)
# S, I, R are numpy arrays with the proportion in each state at each time point
print(f"Peak infection: {I.max():.1%} at t={time_points[I.argmax()]:.1f}")
print(f"Final epidemic size: {R[-1]:.1%} of population")from spkmc import (
# Core simulation
SPKMC, # Main simulator class
# Probability distributions
GammaDistribution, # Gamma-distributed recovery times
ExponentialDistribution, # Exponential recovery times
# Network generation
NetworkFactory, # Create different network types
# Visualization
Visualizer, # Create plots programmatically
# Data management
ResultManager, # Save and load results
)When you run experiments, results are saved in the data/ directory:
data/
├── network_type_comparison/
│ ├── random_network.json
│ ├── scale_free_network.json
│ └── comparison.png
├── another_experiment/
│ └── ...
Each JSON file contains:
{
"S_val": [0.99, 0.95, 0.85, ...],
"I_val": [0.01, 0.04, 0.10, ...],
"R_val": [0.00, 0.01, 0.05, ...],
"S_err": [0.001, 0.002, ...],
"I_err": [0.001, 0.003, ...],
"R_err": [0.001, 0.002, ...],
"time": [0.0, 0.1, 0.2, ...],
"metadata": {
"network": "er",
"distribution": "gamma",
"N": 1000,
"k_avg": 10,
"shape": 2.0,
"scale": 1.0,
"lambda": 0.5,
"samples": 100,
"num_runs": 5
}
}The _err fields contain standard errors and are only present when num_runs > 1.
- Quick exploration: 20-50 samples
- Reliable results: 100-200 samples
- Publication quality: 500+ samples with 5-10 runs
More samples give smoother curves and better statistics, but take longer to compute.
- < 1,000 nodes: Very fast, good for testing
- 1,000-10,000 nodes: Typical research use
- > 10,000 nodes: Consider using GPU acceleration
The first time you run SPKMC, Numba compiles the performance-critical functions to machine code. This takes a few seconds. Subsequent runs will be much faster.
If you want to clear this cache (for example, after updating SPKMC):
spkmc clean --numba-cacheWhen running experiments with multiple scenarios, SPKMC automatically runs them in parallel using all available CPU cores. You don't need to configure this manually.
These options work with any command:
| Option | Description |
|---|---|
--version |
Display the SPKMC version |
-v, --verbose |
Show detailed progress and debug information |
--analyze |
Run AI-powered analysis of results (requires OPENAI_API_KEY environment variable) |
--help |
Show help for any command |
Examples:
# Check your installed version
spkmc --version
# Run with verbose output for debugging
spkmc -v run -n er -d gamma
# Get help for a specific command
spkmc run --helpMake sure you installed SPKMC with pip install spkmc and that your Python environment is activated.
If you're running over SSH or in an environment without a display, use --no-plot to skip visualization. You can later use spkmc plot results.json -o figure.png to save plots to files.
Try reducing the network size (--nodes) or number of samples (--samples).
This is normal - Numba is compiling optimized code. Subsequent runs will be faster.
MIT License - feel free to use SPKMC in your research and applications.
If you use SPKMC in your research, please cite the original algorithm paper:
@article{tolic2018simulating,
title = {Simulating SIR processes on networks using weighted shortest paths},
author = {Toli{\'c}, Dijana and Kleineberg, Kaj-Kolja and Antulov-Fantulin, Nino},
journal = {Scientific Reports},
volume = {8},
number = {1},
pages = {6562},
year = {2018},
publisher = {Nature Publishing Group},
doi = {10.1038/s41598-018-24648-w}
}And optionally, this software implementation:
@software{spkmc,
title = {SPKMC: Shortest Path Kinetic Monte Carlo for Epidemic Simulation},
url = {https://github.com/mcaxtr/spkmc}
}Contributions are welcome! Please feel free to submit issues and pull requests.