SCODE : an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation.
https://academic.oup.com/bioinformatics/article/33/15/2314/3100331
SCODE is written with R, and use MASS library to calculate pseudo inverse matrix.
git clone https://github.com/hmatsu1226/SCODE
cd SCODE
Or download from "Download ZIP" button and unzip it.
Optimize linear ODE and infer regulatory network from time course data.
Rscript SCODE.R <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I>
- Input_file1 : G x C matrix of expression data
- Input_file2 : Time point data (e.g. pseudo-time data)
- Output_dir : Result files are outputted in this directory
- G : The number of transcription factors
- D : The number of z
- C : The number of cells
- I : The number of iterations of optimization
Rscript SCODE.R data/exp_train.txt data/time_train.txt out 100 4 356 100
The Input_file1 is the G x C matrix of expression data (separated with 'TAB'). Each row corresponds to each gene, and each column corresponds to each cell.
1.24 1.21 1.28 ...
0.0 0.19 0.0 ...
.
.
.
The Input_file2 contains the time point data (pseudo-time) of each cell.
- Col1 : Information of a cell (e.g. index of a cell, experimental time point)
- Col2 : Time parameter (e.g. pseudo-time) (normalized from 0.0 to 1.0)
0 0.065
0 0.037
0 0.007
.
.
.
72 0.873
72 0.964
SCODE outputs some files as below, and the files are named to correspond with the names of the variables in the paper.
G x G matrix, which corresponds to infered regulatory network. Aij represents the regulatory relationship from TF j to TF i.
D x D diagonal matrix, which corresponds to the optimized parameters of ODE of z.
G x D matrix, which corresponds to W of linear regression.
The residual sum of squares of linear regression.
We recommend runnning SCODE several times and averaging the result (A) to obtain reliable relationships.
ruby run_R.rb <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I> <R>
- R : The number of traials
- Output_dir : Result files of each trial is outputted in the directory
The averaged A (meanA.txt) is outputted in the Output_dir.
SCODE.jl is written with Julia(Version 0.5.0), and use DataFrames package. The runtimes of SCODE.jl is smaller than that of SCODE.R
julia SCODE.jl <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I>
ruby run_julia.rb <Input_file1> <Input_file2> <Output_dir> <G> <D> <C> <I> <R>
# Downstream analysis
To choose appropriate size of z, we recommend to calculate RSS of independent test data.
Rscript RSS.R <Input_file1> <Input_file2> <Input_dir> <Output_file> <G> <D> <C>
- Input_file1 : G x C matrix of expression data
- Input_file2 : Time point data (e.g. pseudo-time data)
- Input_dir : The directory that W.txt and B.txt are saved (Output_dir of SCODE)
- Output_file : RSS for this data
- G : The number of transcription factors
- D : The number of z
- C : The number of cells
Rscript RSS.R data/exp_test.txt data/time_test.txt out out/RSS_test.txt 100 4 100
Calculate the dynamics from optimized linear ODE.
Rscript Reconstruct_dynamics.R <Input_file1> <Input_file2> <Output_file> <G>
- Input_file1 : Initial value of x
- Input_file2 : A.txt
- Output_file : (G+1) x 101 matrix of reconstructed expression data
- G : The number of transcription factors
Rscript Reconstruct_dynamics.R data/init.txt out/A.txt out/dynamics.txt 100
The Input_file1 is the initial values of x (separated with 'TAB'). Each row corresponds to each gene.
- Col1 : Index of a gene
- Col2 : Initial value
0 1.253
1 1.266
2 1.548
.
.
.
The Output_file is the (G+1) x 101 matrix of reconstructed expression dynamics (separated with 'TAB'). The first column corresponds to time parameter (from 0.0 to 1.0 with 0.01 interval). Each row corresponds to each gene, and each column corresponds to each time point.
0 0.01 0.02 ...
1.253 1.241 1.233 ...
1.266 1.053 0.937 ...
.
.
We validated SCODE with three time couse scRNA-Seq data. We extracted top 100 variable TFs.
scRNA-Seq data derived from PrE cells differentiated from mES cells (in preparation).
scRNA-Seq data obtained to examine direct reprogramming from MEF cells to myocytes. Treutlein, Barbara, et al. "Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq." Nature (2016).
scRNA-Seq data derived from DE cells differentiated from hES cells. Chu, Li-Fang, et al. "Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm." Genome biology 17.1 (2016): 173.
Reference TF-TF networks are extracted from http://www.regulatorynetworks.org . The first column corresponds to target TF. The second column corresponds to regulator TF.
Copyright (c) 2016 Hirotaka Matsumoto Released under the MIT license