Inferring the Type of Phase Transitions Undergone in Epileptic Seizures Using Random Graph Hidden Markov Models for Percolation in Noisy Dynamic Networks
The data used for the analysis in the paper consist of both simulated network time series, and the derived functional network time series using ECoG data on epileptic seizures in human subject. The real data we have are invasive brain voltage recordings for 3 seizures from a patient with epilepsy. Full details on network construction from brain voltage recordings are discussed in Part 3.
Derived data (network time series) are available from the corresponding author on request. Deidentified data may be available upon request to the corresponding author with appropriate IRB approval.
The derived network time series are stored in three MATLAB binary format files (.mat), each with a seizure example. The files have
data$nets[ , ,1]$t - observation time points,
data$nets[ , ,1]$C[, ,i] - the i-th observed network (adjacency matrix)
Most of the data processing and analysis for this paper were done in R, and some in Matlab. This repo provides the corresponding code to conduct both the statistical simulation study and analyses on real data sets for the case studies.
All of the R scripts used in the paper are available in this repo. The MIT license applies to all code, and no permissions are required to access the code. The code is organized as follows:
Model/
: code for implementions of the model and associated inference.main_estimation*.R
: main files for parameter estimation with simulated datamain_test*.R
: main files for testing with both simulated and real datarun_*.sh
: bash scripts for submitting above main files to run in parallelSimulations/
: directory to store ouput result scripts from .shApplicationSeizure/
: contains code for network construction from ECoG data, automatic segment finderDataSeizure/
: directory to store ECoG data on seizures and constructed network time seriesgenerate_*.R
: scripts to reproduce figures in paper
R version 3.5.1 was used for the analyses in this paper. The necessary R libraries for the code used for data processing and analysis are: igraph (v0.4.3), snow (v0.4.3), lme4 (1.1.21), lmerTest (3.0.1), car (3.0.2), sjPlot (2.6.1), R.matlab (3.6.2), ggplot2 (3.1.1), gridExtra (2.3).
Dependices for network construction and dynamic community tracking used in the real data analysis are
The workflow to reproduce all Figures and Tables in the paper is provided as follows.
Table 3 is the mean and standard deviation of parameter estimates based on 100 replicates. Result for single trial can be obtained by running
main_estimation_example.R
By setting two argumentsprocess='PR', process_est='PR'
, estimation results for the PR case can be obtained. For the setting in the example script, the running time is expected to be ~30mins for ER and ~100mins for PR on a standard desktop machine (Macbook pro, 2.6 GHz Intel Core i5 processor, 8G memory).
We recommend to run the 100 replicates in parallel on a Linux computing cluster with the main script main_estimation.R
(add argument parser upon the example main script above) and bash script run_estimation_exp1.sh
. We recommend using 4 cores for each trial (already set up in bash script). Depending on the cluster system, the ways of submitting batch jobs in cluster may differ. The way I used is
qsub -t 1:100 run_estimation_exp1.sh
Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Estimation/EXP1_N20M50
Table 3 can be generated by running the python script post_analysis_exp1.py
in the directory above.
Figure 5 is the mean and standard deviation of parameter estimates based on 20 replicates with varying setting of B. The results can be obtained using the bash script run_estimation_exp2.sh
, which contains 4 sets of job name and input arguments. Uncomment one set and submit the job with qsub -t 1:20 run_estimation_exp2.sh
. And repeat this process for the other job names. Note that each trial requires 4 cores and the total running time may differ significantly depending on the number of cores available to use simultaneously. In any event, the running time for one trial under the 4 different values of B is approximately 0.5 hrs, 2.5 hrs, 5 hrs and 12hrs, respectively.
Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Estimation/EXP2_vary_B
A summary table data_exp2.csv can be generated by running the python script post_analysis_exp2.py
in the directory above, which will be used in generate_paper_fig5-6.R
to generate Figure 5.
Figure 6 is the mean and standard deviation of parameter estimates based on 20 replicates with different setting of (N, M). The results can be obtained using the bash script run_estimation_exp3.sh
, which contains 12 sets of job name and input arguments. Uncomment one set and submit the job with qsub -t 1:20 run_estimation_exp3.sh
. And repeat this process for the other job names. The approximate running time for one trial under each setting of (N, M) with 4 cores ranges from 0.7hrs (N=30, M=50) to 12 hrs (N=70, M=200). Again each trial requires 4 cores and the total running time may differ significantly depending on the number of cores available to use simultaneously.
Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Estimation/EXP3_vary_NM
A summary table data_exp3.csv can be generated by running the python script post_analysis_exp3.py
in the directory above, which will be used in generate_paper_fig5-6.R
to generate Figure 6.
Figure 7 is the size of GCC over normalized time for ER and PR percolation curves, which can be generated by running generate_paper_fig7.R
. The running time for this script is less than 1min.
Figure 8 (left) is the rate of detection for ER and PR based on 100 replicates with different settings for B, t_norm and N. Figure 8 (right), Figure 9 and Table 4 are results from fitting GLMM on the testing results in Figure 8 (left). Result for single trial can be obtained by running
main_test_example.R
For the setting in the example script above, the running time is expected to be ~40mins on a standard desktop machine (Macbook pro, 2.6 GHz Intel Core i5 processor, 8G memory). We recommend to run the 100 replicates in parallel on a Linux computing cluster with the main script main_test.R
(add argument parser upon the example main script above) and bash scripts
run_N10.sh
run_N20.sh
run_N30.sh
Each bash script contains 24 sets of job name and input arguments. Uncomment one set and submit the job with, e.g. qsub -t 1:100 run_N10.sh
. And repeat this process for the other job names. The approximate running time for one trial under each setting of (B, t_norm, N) with 4 cores ranges from 10 mins (N=10, t_norm=0.8, B=50,000) to 22 hrs (N=30, t_norm=2.4, B=500,000). Each trial requires 4 cores and the total running time depends on the number of cores available to use simultaneously.
Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Testing/N10, Simulations/Testing/N20, and Simulations/Testing/N30.
Then aggregate testing results from separate output scripts by running the python scripts post_analysis_N30.py
,post_analysis_N30.py
and post_analysis_N30.py
in the respective directories above. data10.csv, data10_all.csv, data20.csv, data20_all.csv, data30.csv, data30_all.csv will be obtained after this step. Then run generate_paper_fig8-9.R
to generate Figures 8-9 and Table 4.
Simulation results in the Supplement similar to Figure 8-9 and Table 4 can be reproduced using run_N_10_20_30_vary_r.sh
, post_analysis_all_data_lowR
and generate_paper_fig8-9.R
. The workflow is the same.
Data used in the section are network time series constructed from the brain voltage recordings on epileptic seizures. Testing on real seizure data involves the following steps:
-
Construct time series of functional networks: the code we used for network construction is from https://github.com/Mark-Kramer/dppm.
ApplicationSeizure/ConstFunNets
contains the specific portion of code we used for network construction in this project. The functional network time series can be constructed from the ECoG data by runningApplicationSeizure/ConstFunNets/main_const_fun_nets_hemisphere_bipolar.m
The preprocessed voltage traces will be saved in DataSeizure/prepdata_fdr, and the constructed network time series will be saved in DataSeizure/networks_fdr.
-
Track dynamic communities: the code we used for tracking dynamic communities is from https://github.com/Eden-Kramer-Lab/dppm/tree/simplified. ROIs are identified by running
ApplicationSeizure/ConstFunNets/main_track_communities.m
. -
Identify ramp-up(s) in ROIs selected from the previous step for testing using automatic segment finder: run
ApplicationSeizure/auto_detect_curve_left_bipolar.R
.
Figure 10 is the voltage traces, dynamic community membership, density and gcc over time for one real seizure with segments detected for test in bold. The figure can be reproduced by running
ApplicationSeizure/main_track_communities_color_chosen_curves_sorted_plot.m
Table 5 is the testing results for the chosen segment, which can be obtained by running main_test_seizure.R
. The results reported are mean and standard deviation based off 10 runs. Each run takes approximately 12-24 hours depending on the length of network time series. We recommend to run the 10 replicates in parallel in cluster using the bash script run_test_seizure_bipolar_fdr.sh
. Results for each run will be stored in ApplicationSeizure/Results
, and the mean and standard deviation can be obtained by running post_analysis_seizure_results_bipolar_fdr.py
in the directory.