This repository provides Python scripts to recover and visualize causal graphs. This is specifically tailored for the VAR-LiNGAM method, which requires a known causal order as input. Other methods might be unable to adapt these scripts. Additionally, the repository contains scripts to generate a ground truth causal order from a summary matrix and to evaluate the accuracy of the recovered graphs.
The scripts were tested using the following version of Python.
Python 3.12.9
You can check your own Python version by running this command in your terminal:
python --versionTo get started, clone the repository and install the necessary Python packages using the requirements.txt file. (It is recommended to install packages and run all scripts within a virtual environment to avoid dependency conflicts.)
git clone https://github.com/jultrishyyy/Recover-Causal-Graph-from-Causal-Order.git
cd Recover-Causal-Graph-from-Causal-Order
pip install -r requirements.txtBefore running the analysis, ensure your data is correctly formatted and placed in the appropriate directory.
- Causal Order: The causal order of variables should be stored in a
causal_order.txtfile as a Python list. - Ground Truth Summary Matrix: The ground truth summary matrix must be in a
summary_matrix.npyfile, saved as a NumPy array. - Dataset: Your dataset should be in a
.csvfile.
All three files for a given dataset must be located in the same subdirectory within the data/ folder.
File Structure Example:
root/
└── data/
└── Dataset1/
├── causal_order.txt
├── summary_matrix.npy
└── dataset2.csv
└── Dataset2/
├── causal_order.txt
├── summary_matrix.npy
└── dataset2.csv
......
causal_order.txt Example:
[0, 1, 2, 3, 5, 4, 6]
summary_matrix.npy Example:
[[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 1, 0]]To run the script, execute the following command from the root directory:
python run.py --data_path path/to/your/data --output_path path/to/your/resultsFor example, to process the Web_Activity dataset, use its relative path:
python run.py --data_path data/Web_Activity/ --output_path result/Web_Activity/Alternatively, you can directly modify the paths inside the run.py script and run it without arguments for more flexibility:
python run.pyUpon completion, the recovered causal graph and evaluation metrics will be saved in the specified output directory within the result/ folder.
root/
└── data/
└── Dataset1/
├── causal_graph.png
└── metrics.txt
└── Dataset2/
├── causal_graph.png
└── metrics.txt
......
root/
├── data/
├── generate_ground_truth/
├── helper/
├── result/
├── run.py
└── requirements.txt
This directory contains the datasets. Each dataset has its own subfolder, which includes the raw data and the corresponding ground truth files. The repository includes:
- IT Monitoring Data:
Antivirus_Activity,Middleware_oriented_message_Activity,Storm_Ingestion_Activity, andWeb_Activity. (Source: Case_Studies_of_Causal_Discovery) - CausalRiver Datasets:
Flood. (Source: CausalRivers)
This folder contains scripts for generating synthetic datasets and ground truth summary matrices.
generate_order_from_matrix.py: Generates a causal order from a given summary matrix.generate_IT_summary_matrix.py: Creates summary matrices for the IT monitoring datasets.generate_causalriver_summary_matrix.py: Creates summary matrices for the CausalRiver datasets.process_causalriver.py: Preprocesses CausalRiver datasets, including handling missing values and resampling.
This directory contains utility scripts for the causal discovery process.
estimate_adjacency_matrix.py: Estimates the adjacency matrix based on the provided causal order.helper_methods.py: A collection of helper functions, including:convert_Btaus_to_summary_matrix(): Converts aB_tausmatrix to a summary matrix.plot_summary_causal_graph(): Constructs and saves a causal graph from a matrix.prune_summary_matrix_with_best_f1_threshold(): Prunes the estimated summary matrix using the threshold that yields the best F1 score.save_results_and_metrics(): Saves the results and evaluation metrics to the specified path.
This folder stores the output of the analysis. For each dataset, a subfolder is created containing the generated causal graph (causal_graph.png) and performance metrics (metrics.txt).
- Ensure that the number of variables in your dataset matches the dimensions of the summary matrix.
- The
CausalRiverBavariaandCausalRiverEastGermanydatasets are too large for this repository. Please download them from the original CausalRivers GitHub repository. - For large datasets (more than 15 variables), such as
CausalRiverFlood, visualizing the full causal graph is not recommended as it can become cluttered and difficult to interpret. - For any issues or questions, please open an issue on the repository's issue tracker.