why using a 4d array in cp? #5

ccshao · 2020-09-07T13:15:18Z

Dear scTenifoldNet dev,

Thanks very much for the interesting method. That is the first time I learn something about tensor low approximation,
ane I would like to try it on my work.

As I am new to the idea of tensor, I was confused on the 4d array used in the rTensor::cp steps. As the N * N * T (N is the number of genes, T is the sampling times) is 3d array, why a 4d array is used in the implementations (example codes taken from CRAN 1.0.0)?

Her are codes used In the script of tensorDecomposition.R,

#- init a 4d array with the 3rd dim is 1
  tensorX <- array(data = 0, dim = c(nGenes,nGenes,1,nNet))
  if(!is.null(yList)){
    tensorY <- array(data = 0, dim = c(nGenes,nGenes,1,nNet))
  }
  

#- Here only the last dim is filled with correlation matrix.
  for(i in seq_len(nNet)){
    tempX <- matrix(0, nGenes, nGenes)
    rownames(tempX) <- colnames(tempX) <- sGenes
    temp <- as.matrix(xList[[i]])
    tGenes <- sGenes[sGenes %in% rownames(temp)]
    tempX[tGenes,tGenes] <- temp[tGenes,tGenes]
    tensorX[,,,i] <- tempX
    
    if(!is.null(yList)){
      tempY <- matrix(0, nGenes, nGenes)
      rownames(tempY) <- colnames(tempY) <- sGenes
      temp <- as.matrix(yList[[i]])
      tGenes <- sGenes[sGenes %in% rownames(temp)]
      tempY[tGenes,tGenes] <- temp[tGenes,tGenes]
      tensorY[,,,i] <- tempY
    }
    
  }

  tensorX <- rTensor::as.tensor(tensorX)
  set.seed(1)
  tensorX <- rTensor::cp(tnsr = tensorX, num_components = K, max_iter = maxIter, tol = maxError)
  tX <- tensorX$est@data[,,,1]
  for(i in seq_len(nNet)[-1]){
    tX <- tX +  tensorX$est@data[,,,i]
  }
  tX <- tX/nNet
  tX <- tX/max(abs(tX))
  tX <- round(tX,1)
  tX <- as(tX, 'dgCMatrix')
  rownames(tX) <- colnames(tX) <- sGenes

Actually I have tried tensorDecomposition with 3d array, i.e., simly remove the 3rd dim; and replace the "1" in the 3rd to other values. In the former senario I got different results compared to your codes, and the latter case there was an error:

Error in as(tX, "dgCMatrix") :
no method or default for coercing “array” to “dgCMatrix”

Thanks in advance.

The text was updated successfully, but these errors were encountered:

dosorio · 2020-09-07T14:28:57Z

Hi @ccshao,

Thanks for using scTenifoldNet. In the beginning, we tried to decompose the tensor with the two sets of networks at once, and established the structure of the code as a 4 order tensor. Then, our collaborators in statistics suggested that decomposing the tensor indepently for each set of networks was a most appropriate procedure. We keep the former structure and let the other dimension empty. I have modified the function to compute the 3d order tensor for you and I will check it out to see how it does affect the results of scTenifoldNet.

tensor3Decomposition <- function(xList, yList = NULL, K = 5, maxError = 1e-5, maxIter = 1e3){
  xNets <- length(xList)
  if(!is.null(yList)){
    yNets <- length(yList)
    if(xNets != yNets){
      stop('Same number of networks are required in both cases')
    }
    nNet <- unique(c(xNets, yNets))
    xGenes <- unique(unlist(lapply(xList, rownames)))
    yGenes <- unique(unlist(lapply(yList, rownames)))
    sGenes <- intersect(xGenes, yGenes)
  } else {
    nNet <- xNets
    xGenes <- unique(unlist(lapply(xList, rownames)))
    sGenes <- xGenes
  }
  nGenes <- length(sGenes) 
  tensorX <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  if(!is.null(yList)){
    tensorY <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  }
  
  for(i in seq_len(nNet)){
    tempX <- matrix(0, nGenes, nGenes)
    rownames(tempX) <- colnames(tempX) <- sGenes
    temp <- as.matrix(xList[[i]])
    tGenes <- sGenes[sGenes %in% rownames(temp)]
    tempX[tGenes,tGenes] <- temp[tGenes,tGenes]
    tensorX[,,i] <- tempX
    
    if(!is.null(yList)){
      tempY <- matrix(0, nGenes, nGenes)
      rownames(tempY) <- colnames(tempY) <- sGenes
      temp <- as.matrix(yList[[i]])
      tGenes <- sGenes[sGenes %in% rownames(temp)]
      tempY[tGenes,tGenes] <- temp[tGenes,tGenes]
      tensorY[,,i] <- tempY
    }
    
  }
  set.seed(1)
  tensorX <- scTenifoldNet:::as.tensor(tensorX)
  tensorX <- scTenifoldNet:::cpDecomposition(tnsr = tensorX, num_components = K, max_iter = maxIter, tol = maxError)
  tX <- tensorX$est$data[,,1]
  for(i in seq_len(nNet)[-1]){
    tX <- tX +  tensorX$est$data[,,i]
  }
  tX <- tX/nNet
  tX <- tX/max(abs(tX))
  tX <- round(tX,1)
  tX <- as(tX, 'dgCMatrix')
  rownames(tX) <- colnames(tX) <- sGenes
  
  if(!is.null(yList)){
    set.seed(1)
    tensorY <- as.tensor(tensorY)
    tensorY <- cpDecomposition(tnsr = tensorY, num_components = K, max_iter = 1e3)
    tY <- tensorY$est$data[,,1]
    for(i in seq_len(nNet)[-1]){
      tY <- tY +  tensorY$est$data[,,i]
    }
    tY <- tY/nNet
    tY <- tY/max(abs(tY))
    tY <- round(tY,1)
    tY <- as(tY, 'dgCMatrix')  
    rownames(tY) <- colnames(tY) <- sGenes
  }
  
  tensorOutput <- list()
  tensorOutput$X <- tX
  if(!is.null(yList)){
    tensorOutput$Y <- tY
  }
  return(tensorOutput)
}

Please let me know if there is something else I can help you with,

Best,

Daniel

dosorio · 2020-09-07T17:37:28Z

Hi @ccshao

Thanks for pointing this out. I have tested the effect on the results of scTenifoldNet. I found that the identified genes as significant are the same, and the Pearson correlation coefficient between the distances is 0.941. Please see the code below:

library(devtools)
load_all('.')

nCells = 2000
nGenes = 100
set.seed(1)
X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98)
X <- round(X)
X <- matrix(X, ncol = nCells)
rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10))

Y <- X
Y[10,] <- Y[50,]
Y[2,] <- Y[11,]
Y[3,] <- Y[5,]

outputHA_td3 <- scTenifold3Net(X = X, Y = Y,
                          nc_nNet = 10, nc_nCells = 500,
                          td_K = 3, qc_minLibSize = 30)

outputHA_td4 <- scTenifoldNet(X = X, Y = Y,
                            nc_nNet = 10, nc_nCells = 500,
                            td_K = 3, qc_minLibSize = 30)

D1 <- outputHA_td3$diffRegulation$distance
names(D1) <- outputHA_td3$diffRegulation$gene

D2 <- outputHA_td4$diffRegulation$distance
names(D2) <- outputHA_td4$diffRegulation$gene

D2 <- D2[names(D1)]

cor(D1,D2)

png('tensorDimension.png', width = 1500, height = 1500, res = 300)
par(mar=c(3,3,1,1), mgp=c(1.5,0.5,0))
gCol <- ifelse(outputHA_td3$diffRegulation$p.adj < 0.1 | outputHA_td4$diffRegulation$p.adj < 0.1, 'red', 'black')
plot(D2,D1, xlab = 'Tensor 4D', ylab = 'Tensor 3D', pch = 16, col = gCol)
abline(0,1, col = 'red', lty = 2)
dev.off()

ccshao · 2020-09-08T16:40:47Z

@dosorio Thanks very much for the quick updates, indeed it seems the choice of 4d/3d array makes little difference in comparing two networks.

However, when I compared two networks from a single X datasets from the 4d/3d approaches, cp gives quite different structures indicated with low correlation value. In this scenario, which network is preferred?

Note I used the cran version 1.0.0, but the tensorDecomposition2 is nearly identical to your codes in tensor3Decomposition.

The codes used to produce a single network with either 3d or 4d.

library(scTenifoldNet) #- cran version 1.0.0
source("./tensorDecomposition2.R") #- see codes at the end.

nCells = 2000
nGenes = 100
set.seed(1)
X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98)
X <- round(X)
X <- matrix(X, ncol = nCells)
rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10))

mnOutput <- makeNetworks(X = X,
                         nNet = 10,
                         nCells = 500,
                         nComp = 3,
                         scaleScores = TRUE,
                         symmetric = FALSE,
                         q = 0.95
                         )
#- networks from cp on 4d array
mat1 <- as.matrix(tensorDecomposition(mnOutput, K = 3, maxError = 1e5, maxIter = 1e3)$X)
#- networks from cp on 3d array
mat2 <- as.matrix(tensorDecomposition2(mnOutput, K = 3, maxError = 1e5, maxIter = 1e3)$X)

cor(c(mat1), c(mat2))
0.2541093
hist(sapply(seq_len(ncol(mat1)), function(x) cor(mat1[, x], mat2[, x])), breaks = 50, xlab = "cor", main = "")

#- CP with 3d array

tensorDecomposition2 <- function(xList, yList = NULL, K = 5, maxError = 1e-5, maxIter = 1e3){
  xNets <- length(xList)
  if(!is.null(yList)){
    yNets <- length(yList)
    if(xNets != yNets){
      stop('Same number of networks are required in both cases')
    }
    nNet <- unique(c(xNets, yNets))
    xGenes <- unique(unlist(lapply(xList, rownames)))
    yGenes <- unique(unlist(lapply(yList, rownames)))
    sGenes <- intersect(xGenes, yGenes)
  } else {
    nNet <- xNets
    xGenes <- unique(unlist(lapply(xList, rownames)))
    sGenes <- xGenes
  }
  nGenes <- length(sGenes)

  tensorX <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  if(!is.null(yList)){
    tensorY <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  }

  for(i in seq_len(nNet)){
    tempX <- matrix(0, nGenes, nGenes)
    rownames(tempX) <- colnames(tempX) <- sGenes
    temp <- as.matrix(xList[[i]])
    tGenes <- sGenes[sGenes %in% rownames(temp)]
    tempX[tGenes,tGenes] <- temp[tGenes,tGenes]
    tensorX[,,i] <- tempX

    if(!is.null(yList)){
      tempY <- matrix(0, nGenes, nGenes)
      rownames(tempY) <- colnames(tempY) <- sGenes
      temp <- as.matrix(yList[[i]])
      tGenes <- sGenes[sGenes %in% rownames(temp)]
      tempY[tGenes,tGenes] <- temp[tGenes,tGenes]
      tensorY[,,i] <- tempY
    }

  }

  tensorX <- rTensor::as.tensor(tensorX)
  set.seed(1)
  tensorX <- rTensor::cp(tnsr = tensorX, num_components = K, max_iter = maxIter, tol = maxError)
  tX <- tensorX$est@data[,,1]
  for(i in seq_len(nNet)[-1]){
    tX <- tX +  tensorX$est@data[,,i]
  }
  tX <- tX/nNet
  tX <- tX/max(abs(tX))
  tX <- round(tX,1)
  tX <- as(tX, 'dgCMatrix')
  rownames(tX) <- colnames(tX) <- sGenes

  if(!is.null(yList)){
    tensorY <- rTensor::as.tensor(tensorY)
    set.seed(1)
    tensorY <- rTensor::cp(tnsr = tensorY, num_components = K, max_iter = 1e3)
    tY <- tensorY$est@data[,,1]
    for(i in seq_len(nNet)[-1]){
      tY <- tY +  tensorY$est@data[,,,i]
    }
    tY <- tY/nNet
    tY <- tY/max(abs(tY))
    tY <- round(tY,1)
    tY <- as(tY, 'dgCMatrix')
    rownames(tY) <- colnames(tY) <- sGenes
  }

  tensorOutput <- list()
  tensorOutput$X <- tX
  if(!is.null(yList)){
    tensorOutput$Y <- tY
  }
  return(tensorOutput)
}

dosorio · 2020-09-09T13:06:04Z

Hi @ccshao,

We have tested and benchmarked the 4d configuration against 11 other packages using Beeline and it ranks in first tied with PIDC for the network construction. Please see the attached figure:

ccshao · 2020-09-09T14:35:23Z

@dosorio would you mind to share the benchmark codes used in Beeline for the 12 methods? It would be interesting to see how the cp 3d array works.

dosorio · 2020-09-10T14:18:12Z

Hey @ccshao, Sure. All the required code is available at: https://github.com/cailab-tamu/scTenifoldNet/tree/master/inst/benchmarking/Beeline

Best,

Daniel

ccshao · 2020-09-12T07:24:40Z

Thanks! I will try to play with different settings of scTenifoldNet.
BTW, rTensor packages has been replaces with in-house codes, What is the key differences comparing to functions related to rTensor?

dosorio · 2020-09-14T15:16:43Z

Hi @ccshao, after the first release of scTenifoldNet, we received a notification from CRAN that the rTensor package was no longer maintained, and they planned to shut it down. We added the function as native in our package and added the citation to the original package. Nothing else changed.

ccshao · 2020-09-14T16:55:09Z

Thank, it is clear.

…

On 14. Sep 2020, at 17:17, Daniel Osorio ***@***.***> wrote: Hi @ccshao <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fccshao&data=02%7C01%7C%7C69504a737acf445a8dc308d858c13a9b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637356934213193372&sdata=N8igszo42brJ0roeqTE%2F4fF7iTNb0v4OXOiVYLdtYU0%3D&reserved=0>, after the first release of scTenifoldNet, we received a notification from CRAN that the rTensor package was no longer maintained, and they planned to shut it down. We added the function as native in our package and added the citation to the original package. Nothing else changed. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcailab-tamu%2FscTenifoldNet%2Fissues%2F5%23issuecomment-692126226&data=02%7C01%7C%7C69504a737acf445a8dc308d858c13a9b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637356934213203364&sdata=mY6AFa8ZrSIJpFBBhzF%2B9BFk0qLhcakvlspbkD%2FnZ6U%3D&reserved=0>, or unsubscribe <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABF77FZE6LA7ZGD426HP76LSFYXWZANCNFSM4Q6FW4ZA&data=02%7C01%7C%7C69504a737acf445a8dc308d858c13a9b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637356934213213365&sdata=hRCVsxEZVcPPAX9csbYu2etPs4e7wczFqrZGE6uKPJE%3D&reserved=0>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why using a 4d array in cp? #5

why using a 4d array in cp? #5

ccshao commented Sep 7, 2020

dosorio commented Sep 7, 2020 •

edited

Loading

dosorio commented Sep 7, 2020

ccshao commented Sep 8, 2020

dosorio commented Sep 9, 2020

ccshao commented Sep 9, 2020

dosorio commented Sep 10, 2020

ccshao commented Sep 12, 2020

dosorio commented Sep 14, 2020

ccshao commented Sep 14, 2020 via email

why using a 4d array in cp? #5

why using a 4d array in cp? #5

Comments

ccshao commented Sep 7, 2020

dosorio commented Sep 7, 2020 • edited Loading

dosorio commented Sep 7, 2020

ccshao commented Sep 8, 2020

dosorio commented Sep 9, 2020

ccshao commented Sep 9, 2020

dosorio commented Sep 10, 2020

ccshao commented Sep 12, 2020

dosorio commented Sep 14, 2020

ccshao commented Sep 14, 2020 via email

dosorio commented Sep 7, 2020 •

edited

Loading