Skip to content

KolaczykResearch/RG-HMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inferring the Type of Phase Transitions Undergone in Epileptic Seizures Using Random Graph Hidden Markov Models for Percolation in Noisy Dynamic Networks

Author Contributions Checklist Form

Part 1: Data

Abstract

The data used for the analysis in the paper consist of both simulated network time series, and the derived functional network time series using ECoG data on epileptic seizures in human subject. The real data we have are invasive brain voltage recordings for 3 seizures from a patient with epilepsy. Full details on network construction from brain voltage recordings are discussed in Part 3.

Availability

Derived data (network time series) are available from the corresponding author on request. Deidentified data may be available upon request to the corresponding author with appropriate IRB approval.

Description

The derived network time series are stored in three MATLAB binary format files (.mat), each with a seizure example. The files have

data$nets[ , ,1]$t - observation time points,

data$nets[ , ,1]​$C[, ,i] - the i-th observed network (adjacency matrix)

Part 2: Code

Abstract

Most of the data processing and analysis for this paper were done in R, and some in Matlab. This repo provides the corresponding code to conduct both the statistical simulation study and analyses on real data sets for the case studies.

Description

All of the R scripts used in the paper are available in this repo. The MIT license applies to all code, and no permissions are required to access the code. The code is organized as follows:

  • Model/: code for implementions of the model and associated inference.
  • main_estimation*.R: main files for parameter estimation with simulated data
  • main_test*.R: main files for testing with both simulated and real data
  • run_*.sh: bash scripts for submitting above main files to run in parallel
  • Simulations/: directory to store ouput result scripts from .sh
  • ApplicationSeizure/: contains code for network construction from ECoG data, automatic segment finder
  • DataSeizure/: directory to store ECoG data on seizures and constructed network time series
  • generate_*.R: scripts to reproduce figures in paper

Optional Information

R version 3.5.1 was used for the analyses in this paper. The necessary R libraries for the code used for data processing and analysis are: igraph (v0.4.3), snow (v0.4.3), lme4 (1.1.21), lmerTest (3.0.1), car (3.0.2), sjPlot (2.6.1), R.matlab (3.6.2), ggplot2 (3.1.1), gridExtra (2.3).

Dependices for network construction and dynamic community tracking used in the real data analysis are

Part 3: Reproducibility workflow

The workflow to reproduce all Figures and Tables in the paper is provided as follows.

To reproduce simulation results (Section 5 in paper)

Table 3

Table 3 is the mean and standard deviation of parameter estimates based on 100 replicates. Result for single trial can be obtained by running

main_estimation_example.R

By setting two argumentsprocess='PR', process_est='PR', estimation results for the PR case can be obtained. For the setting in the example script, the running time is expected to be ~30mins for ER and ~100mins for PR on a standard desktop machine (Macbook pro, 2.6 GHz Intel Core i5 processor, 8G memory).

We recommend to run the 100 replicates in parallel on a Linux computing cluster with the main script main_estimation.R (add argument parser upon the example main script above) and bash script run_estimation_exp1.sh. We recommend using 4 cores for each trial (already set up in bash script). Depending on the cluster system, the ways of submitting batch jobs in cluster may differ. The way I used is

qsub -t 1:100 run_estimation_exp1.sh

Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Estimation/EXP1_N20M50

Table 3 can be generated by running the python script post_analysis_exp1.py in the directory above.

Figure 5

Figure 5 is the mean and standard deviation of parameter estimates based on 20 replicates with varying setting of B. The results can be obtained using the bash script run_estimation_exp2.sh, which contains 4 sets of job name and input arguments. Uncomment one set and submit the job with qsub -t 1:20 run_estimation_exp2.sh. And repeat this process for the other job names. Note that each trial requires 4 cores and the total running time may differ significantly depending on the number of cores available to use simultaneously. In any event, the running time for one trial under the 4 different values of B is approximately 0.5 hrs, 2.5 hrs, 5 hrs and 12hrs, respectively.

Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Estimation/EXP2_vary_B

A summary table data_exp2.csv can be generated by running the python script post_analysis_exp2.py in the directory above, which will be used in generate_paper_fig5-6.R to generate Figure 5.

Figure 6

Figure 6 is the mean and standard deviation of parameter estimates based on 20 replicates with different setting of (N, M). The results can be obtained using the bash script run_estimation_exp3.sh, which contains 12 sets of job name and input arguments. Uncomment one set and submit the job with qsub -t 1:20 run_estimation_exp3.sh. And repeat this process for the other job names. The approximate running time for one trial under each setting of (N, M) with 4 cores ranges from 0.7hrs (N=30, M=50) to 12 hrs (N=70, M=200). Again each trial requires 4 cores and the total running time may differ significantly depending on the number of cores available to use simultaneously.

Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Estimation/EXP3_vary_NM

A summary table data_exp3.csv can be generated by running the python script post_analysis_exp3.py in the directory above, which will be used in generate_paper_fig5-6.R to generate Figure 6.

Figure 7

Figure 7 is the size of GCC over normalized time for ER and PR percolation curves, which can be generated by running generate_paper_fig7.R. The running time for this script is less than 1min.

Figure 8, Figure 9, Table 4

Figure 8 (left) is the rate of detection for ER and PR based on 100 replicates with different settings for B, t_norm and N. Figure 8 (right), Figure 9 and Table 4 are results from fitting GLMM on the testing results in Figure 8 (left). Result for single trial can be obtained by running

main_test_example.R

For the setting in the example script above, the running time is expected to be ~40mins on a standard desktop machine (Macbook pro, 2.6 GHz Intel Core i5 processor, 8G memory). We recommend to run the 100 replicates in parallel on a Linux computing cluster with the main script main_test.R (add argument parser upon the example main script above) and bash scripts

run_N10.sh run_N20.sh run_N30.sh

Each bash script contains 24 sets of job name and input arguments. Uncomment one set and submit the job with, e.g. qsub -t 1:100 run_N10.sh. And repeat this process for the other job names. The approximate running time for one trial under each setting of (B, t_norm, N) with 4 cores ranges from 10 mins (N=10, t_norm=0.8, B=50,000) to 22 hrs (N=30, t_norm=2.4, B=500,000). Each trial requires 4 cores and the total running time depends on the number of cores available to use simultaneously.

Once the submitted jobs are completed, the output result scripts will be saved in Simulations/Testing/N10, Simulations/Testing/N20, and Simulations/Testing/N30.

Then aggregate testing results from separate output scripts by running the python scripts post_analysis_N30.py,post_analysis_N30.py and post_analysis_N30.py in the respective directories above. data10.csv, data10_all.csv, data20.csv, data20_all.csv, data30.csv, data30_all.csv will be obtained after this step. Then run generate_paper_fig8-9.R to generate Figures 8-9 and Table 4.

Figures in the Supplement

Simulation results in the Supplement similar to Figure 8-9 and Table 4 can be reproduced using run_N_10_20_30_vary_r.sh, post_analysis_all_data_lowR and generate_paper_fig8-9.R. The workflow is the same.

To reproduce application results (Section 6 in paper)

Data used in the section are network time series constructed from the brain voltage recordings on epileptic seizures. Testing on real seizure data involves the following steps:

  • Construct time series of functional networks: the code we used for network construction is from https://github.com/Mark-Kramer/dppm. ApplicationSeizure/ConstFunNets contains the specific portion of code we used for network construction in this project. The functional network time series can be constructed from the ECoG data by running

    ApplicationSeizure/ConstFunNets/main_const_fun_nets_hemisphere_bipolar.m

    The preprocessed voltage traces will be saved in DataSeizure/prepdata_fdr, and the constructed network time series will be saved in DataSeizure/networks_fdr.

  • Track dynamic communities: the code we used for tracking dynamic communities is from https://github.com/Eden-Kramer-Lab/dppm/tree/simplified. ROIs are identified by running

    ApplicationSeizure/ConstFunNets/main_track_communities.m.

  • Identify ramp-up(s) in ROIs selected from the previous step for testing using automatic segment finder: run ApplicationSeizure/auto_detect_curve_left_bipolar.R.

Figure 10

Figure 10 is the voltage traces, dynamic community membership, density and gcc over time for one real seizure with segments detected for test in bold. The figure can be reproduced by running

ApplicationSeizure/main_track_communities_color_chosen_curves_sorted_plot.m

Table 5

Table 5 is the testing results for the chosen segment, which can be obtained by running main_test_seizure.R. The results reported are mean and standard deviation based off 10 runs. Each run takes approximately 12-24 hours depending on the length of network time series. We recommend to run the 10 replicates in parallel in cluster using the bash script run_test_seizure_bipolar_fdr.sh . Results for each run will be stored in ApplicationSeizure/Results, and the mean and standard deviation can be obtained by running post_analysis_seizure_results_bipolar_fdr.py in the directory.

About

Work on Random Graph Hidden Markov Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published