-
Notifications
You must be signed in to change notification settings - Fork 0
Transfer entropy calculation from expression data and pseudotime
NOLAN respects the original TENET's data format, so you can use whatever data you've been using for TENET in NOLAN too. The TENET's expression matrix is in the format same as the wishbone package.
This is the example expression data that we are going to use.
$ cat expr.csv
,GENE_A,GENE_B,GENE_C
cell_a,0.0,0.0,0.0
cell_b,0.1,0.0,0.0
cell_c,0.2,0.0,0.0
cell_d,0.3,0.0,0.0
cell_e,0.4,0.2,0.0
cell_f,0.3,0.4,0.0
cell_g,0.2,0.6,0.1
cell_h,0.1,0.8,0.2
cell_i,0.0,0.6,0.3
cell_j,0.0,0.4,0.4
cell_k,0.0,0.2,0.3
cell_l,0.0,0.0,0.2
cell_m,0.0,0.0,0.1
cell_n,0.0,0.0,0.0
The trajectory data is just a newline-delimited text file with timepoints on each line. The N time points must be in the same order as the N cells of the expression file. The trajectory can be a real timestamp, or a pseudotime from tools like wishbone or monocle3.
This is the sample trajectory data that we are going to use.
$ cat traj.tsv
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
TENET also needs a cell select file, which contains N boolean values deciding whether to use the cell or not for the analysis. However, in this tutorial, we don't need one as we'll use every single cells in the data.
from NOLAN import *
We'll use these parse*
functions which are designed to read TENET formatted data into a pandas dataframe.
expr, cells, genes = parseGenes('expr.csv')
traj = parseTrajectory('traj.tsv')
mask = [1] * len(cells) # we use all cells, so we don't have a mask file.
# alternatively, use the code below if you have one.
# mask = parseCellSelect('select.tsv')
print(expr)
Check the results.
GENE_A GENE_B GENE_C
Unnamed: 0
cell_a 0.0 0.0 0.0
cell_b 0.1 0.0 0.0
cell_c 0.2 0.0 0.0
cell_d 0.3 0.0 0.0
cell_e 0.4 0.2 0.0
cell_f 0.3 0.4 0.0
cell_g 0.2 0.6 0.1
cell_h 0.1 0.8 0.2
cell_i 0.0 0.6 0.3
cell_j 0.0 0.4 0.4
cell_k 0.0 0.2 0.3
cell_l 0.0 0.0 0.2
cell_m 0.0 0.0 0.1
cell_n 0.0 0.0 0.0
Now, we'll use the function inferNetwork()
to calculate the Transfer Entropy matrix from the data.
Consult the original TENET paper for the meaning of histLen
.
network = inferNetwork(expr, genes, traj, mask, histLen=1, workers=1)
print(network)
The result is a matrix, with transfer entropy values from rows to columns. For example, the transfer entropy value from GENE_A
to GENE_B
is 0.761549
.
GENE_A GENE_B GENE_C
GENE_A 0 0.761549 0.453857
GENE_B 0.549634 0 0.761549
GENE_C 0.607703 0.511925 0
If your data is large, you can use some parallism with the workers
parameter.
Now, we'll prepare the TE matrix for downstream analysis with reconstructNet()
.
The alpha
parameter controls the false positive correction step. Consult the statsmodels
package for the meaning of this.
grn = reconstructNet(network, alpha=0.05)
print(grn)
Check the results.
source relationship target tstat pval adj_pval
1 GENE_A 0.761549 GENE_B 1.190132 0.116997 0.350992
5 GENE_B 0.761549 GENE_C 1.190132 0.116997 0.350992
2 GENE_A 0.453857 GENE_C -1.190132 0.883003 0.883003
3 GENE_B 0.549634 GENE_A -0.449208 0.673359 0.883003
6 GENE_C 0.607703 GENE_A 0.0 0.500000 0.883003
7 GENE_C 0.511925 GENE_B -0.740924 0.770630 0.883003
One of the TENET's key features is that it removes indirect edges along multiple genes.
We can do this by using the function trimIndirect()
.
The threshold
parameter determines the sensitivity of indirect edge detection.
Positive values will cause more edges to be considered indirect.
Negative values will cause less edges to be considered indirect.
trimmed = trimIndirect(grn, threshold=0)
print(trimmed)
source relationship target tstat pval adj_pval
6 GENE_C 0.607703 GENE_A 0.0 0.500000 0.883003
1 GENE_A 0.761549 GENE_B 1.190132 0.116997 0.350992
5 GENE_B 0.761549 GENE_C 1.190132 0.116997 0.350992
$ ./TENET expr.csv 1 traj.tsv select.tsv 1
2023-02-06 04:26:56.501955
2023-02-06 04:26:56.502497
2023-02-06 04:26:56.502958
real 0m1.389s
user 0m1.110s
sys 0m0.206s
$ cat TE_result_matrix.txt
TE GENE_A GENE_B GENE_C
GENE_A 0 0.7615488099915483 0.4538565022992404
GENE_B 0.5496343867482045 0 0.7615488099915483
GENE_C 0.6077026561453944 0.5119247716964302 0