Skip to content

Transfer entropy calculation from expression data and pseudotime

Stfort52 edited this page Feb 13, 2023 · 6 revisions

Transfer entropy calculation

prepare the example data

NOLAN respects the original TENET's data format, so you can use whatever data you've been using for TENET in NOLAN too. The TENET's expression matrix is in the format same as the wishbone package.

This is the example expression data that we are going to use.

$ cat expr.csv
,GENE_A,GENE_B,GENE_C
cell_a,0.0,0.0,0.0
cell_b,0.1,0.0,0.0
cell_c,0.2,0.0,0.0
cell_d,0.3,0.0,0.0
cell_e,0.4,0.2,0.0
cell_f,0.3,0.4,0.0
cell_g,0.2,0.6,0.1
cell_h,0.1,0.8,0.2
cell_i,0.0,0.6,0.3
cell_j,0.0,0.4,0.4
cell_k,0.0,0.2,0.3
cell_l,0.0,0.0,0.2
cell_m,0.0,0.0,0.1
cell_n,0.0,0.0,0.0

The trajectory data is just a newline-delimited text file with timepoints on each line. The N time points must be in the same order as the N cells of the expression file. The trajectory can be a real timestamp, or a pseudotime from tools like wishbone or monocle3.

This is the sample trajectory data that we are going to use.

$ cat traj.tsv
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3

TENET also needs a cell select file, which contains N boolean values deciding whether to use the cell or not for the analysis. However, in this tutorial, we don't need one as we'll use every single cells in the data.

Load the libraries

from NOLAN import *

Read the expression and pseudotime data

We'll use these parse* functions which are designed to read TENET formatted data into a pandas dataframe.

expr, cells, genes = parseGenes('expr.csv')
traj = parseTrajectory('traj.tsv')
mask = [1] * len(cells) # we use all cells, so we don't have a mask file.
# alternatively, use the code below if you have one.
# mask = parseCellSelect('select.tsv')
print(expr)

Check the results.

            GENE_A  GENE_B  GENE_C
Unnamed: 0                        
cell_a         0.0     0.0     0.0
cell_b         0.1     0.0     0.0
cell_c         0.2     0.0     0.0
cell_d         0.3     0.0     0.0
cell_e         0.4     0.2     0.0
cell_f         0.3     0.4     0.0
cell_g         0.2     0.6     0.1
cell_h         0.1     0.8     0.2
cell_i         0.0     0.6     0.3
cell_j         0.0     0.4     0.4
cell_k         0.0     0.2     0.3
cell_l         0.0     0.0     0.2
cell_m         0.0     0.0     0.1
cell_n         0.0     0.0     0.0

Calculate Transfer Entropy matrix from the expression data and trajectory

Now, we'll use the function inferNetwork() to calculate the Transfer Entropy matrix from the data. Consult the original TENET paper for the meaning of histLen.

network = inferNetwork(expr, genes, traj, mask, histLen=1, workers=1)
print(network)

The result is a matrix, with transfer entropy values from rows to columns. For example, the transfer entropy value from GENE_A to GENE_B is 0.761549.

          GENE_A    GENE_B    GENE_C
GENE_A         0  0.761549  0.453857
GENE_B  0.549634         0  0.761549
GENE_C  0.607703  0.511925         0

If your data is large, you can use some parallism with the workers parameter.

Serialize and filter the TE matrix into a GRN

Now, we'll prepare the TE matrix for downstream analysis with reconstructNet(). The alpha parameter controls the false positive correction step. Consult the statsmodels package for the meaning of this.

grn = reconstructNet(network, alpha=0.05)
print(grn)

Check the results.

   source relationship  target     tstat      pval  adj_pval
1  GENE_A     0.761549  GENE_B  1.190132  0.116997  0.350992
5  GENE_B     0.761549  GENE_C  1.190132  0.116997  0.350992
2  GENE_A     0.453857  GENE_C -1.190132  0.883003  0.883003
3  GENE_B     0.549634  GENE_A -0.449208  0.673359  0.883003
6  GENE_C     0.607703  GENE_A       0.0  0.500000  0.883003
7  GENE_C     0.511925  GENE_B -0.740924  0.770630  0.883003

Trim indirect edges from the grn

One of the TENET's key features is that it removes indirect edges along multiple genes. We can do this by using the function trimIndirect(). The threshold parameter determines the sensitivity of indirect edge detection. Positive values will cause more edges to be considered indirect. Negative values will cause less edges to be considered indirect.

trimmed = trimIndirect(grn, threshold=0)
print(trimmed)
   source relationship  target     tstat      pval  adj_pval
6  GENE_C     0.607703  GENE_A       0.0  0.500000  0.883003
1  GENE_A     0.761549  GENE_B  1.190132  0.116997  0.350992
5  GENE_B     0.761549  GENE_C  1.190132  0.116997  0.350992

Check Integrity

Compare the results with the original TENET

$ ./TENET expr.csv 1 traj.tsv select.tsv 1
2023-02-06 04:26:56.501955
2023-02-06 04:26:56.502497
2023-02-06 04:26:56.502958

real    0m1.389s
user    0m1.110s
sys     0m0.206s
$ cat TE_result_matrix.txt
TE      GENE_A  GENE_B  GENE_C
GENE_A  0       0.7615488099915483      0.4538565022992404
GENE_B  0.5496343867482045      0       0.7615488099915483
GENE_C  0.6077026561453944      0.5119247716964302      0