A repository to generate dataset with marginal efficiency for each actor from the evaluated network and evaluete various influence maximisation methods.
- Authors: Piotr Bródka, Michał Czuba, Adam Piróg, Mateusz Stolarski
- Affiliation: WUST, Wrocław, Lower Silesia, Poland
First, initialise the enviornment:
conda env create -f env/conda.yaml
conda activate infmax-simulator-icm-mln
Then, pull the submodule with data loaders and install its code:
git submodule init && git submodule update
pip install -e _dataset/infmax_data_utils
A final step is to install wrappers for influence-maximisation methods into the conda environment.
We recommend to link it in editable mode, so after you clone particular method just install it with
pip install -e ../path/to/infmax/method
.
Dataset is stored in a separate repository bounded with this project as a git submodule. Thus, to obtain it you have to pull the data from the DVC remote. In order to access it, please sent a request to get an access via e-mail (michal.czuba@pwr.edu.pl). Then, simply execute in a shell:
cd _data_set && dvc pull nsl_data_sources/raw/multi_layer_networks/*.dvc && cd ..
cd _data_set && dvc pull nsl_data_sources/spreading_potentials/multi_layer_networks/*.dvc && cd ..
.
├── _configs -> eample configuration files to trigger the pipeline
├── _data_set -> networks to compute actors' marginal efficiency for
├── _test_data -> examplary outputs of the dataset generator used in the E2E test
├── _output -> a directory where we recommend to save results
├── env -> a definition of the runtime environment
├── src
│ ├── evaluators -> scripts to evaluate performance of infmax methods
│ ├── generators -> scripts to generate SPs according to provided configs
│ └── icm -> implementations of the ICM adapted to multilayer networks
├── README.md
├── run_experiments.py -> main entrypoint to trigger the pipeline
└── test_reproducibility.py -> E2E test to prove that results can be repeated
To run experiments execute: run_experiments.py
and provide proper CLI arguments, i.e. a path to
the configuration file. See examples in _configs
for inspirations. The pipeline has two modes
defined under the run:experiment_type
field.
The first one ("generate"
), for each evaluated case of ICM, produces a csv file a folllowing data
regarding each actor of the network:
actor: int # actor's id
simulation_length: int # nb. of simulation steps
exposed: int # nb. of infected actors
not_exposed: int # nb. of not infected actors
peak_infected: int # maximal nb. of infected actors in a single sim. step
peak_iteration: int # a sim. step when the peak occured
The second option ("evaluate"
) serves as an evaluation pipeline for various seed selection methods
which are defined in the study. That is, for each evaluated case of ICM it produces a following csv:
infmax_model: str # name of the model used in the evaluation
seed_set: str # IDs of seed-actors aggr. into str (sep. by ;)
gain: float # gain obtained using this seed set
simulation_length: int # nb. of simulation steps
exposed: int # nb. of active actors at the end of the simulation
not_exposed: int # nb. of actors that remained inactive
peak_infected: int # maximal nb. of infected actors in a single sim. step
peak_iteration: int # a sim. step when the peak occured
expositions_rec: str # record of new activations aggr. into str (sep. by ;)
Selecting GPU (for a tensor
runner) is possible only by setting an env variable before executing
the Python code, e.g. export CUDA_VISIBLE_DEVICES=3
For instance:
conda activate infmax-simulator-icm-mln
export CUDA_VISIBLE_DEVICES=2
python generate_sp.py _configs/example_tensor.yaml
Results are supposed to be fully reproducable. There is a test for that: test_reproducibility.py
.