Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why using a 4d array in cp? #5

Open
ccshao opened this issue Sep 7, 2020 · 9 comments
Open

why using a 4d array in cp? #5

ccshao opened this issue Sep 7, 2020 · 9 comments

Comments

@ccshao
Copy link

ccshao commented Sep 7, 2020

Dear scTenifoldNet dev,

Thanks very much for the interesting method. That is the first time I learn something about tensor low approximation,
ane I would like to try it on my work.

As I am new to the idea of tensor, I was confused on the 4d array used in the rTensor::cp steps. As the N * N * T (N is the number of genes, T is the sampling times) is 3d array, why a 4d array is used in the implementations (example codes taken from CRAN 1.0.0)?

Her are codes used In the script of tensorDecomposition.R,

#- init a 4d array with the 3rd dim is 1
  tensorX <- array(data = 0, dim = c(nGenes,nGenes,1,nNet))
  if(!is.null(yList)){
    tensorY <- array(data = 0, dim = c(nGenes,nGenes,1,nNet))
  }
  

#- Here only the last dim is filled with correlation matrix.
  for(i in seq_len(nNet)){
    tempX <- matrix(0, nGenes, nGenes)
    rownames(tempX) <- colnames(tempX) <- sGenes
    temp <- as.matrix(xList[[i]])
    tGenes <- sGenes[sGenes %in% rownames(temp)]
    tempX[tGenes,tGenes] <- temp[tGenes,tGenes]
    tensorX[,,,i] <- tempX
    
    if(!is.null(yList)){
      tempY <- matrix(0, nGenes, nGenes)
      rownames(tempY) <- colnames(tempY) <- sGenes
      temp <- as.matrix(yList[[i]])
      tGenes <- sGenes[sGenes %in% rownames(temp)]
      tempY[tGenes,tGenes] <- temp[tGenes,tGenes]
      tensorY[,,,i] <- tempY
    }
    
  }

  tensorX <- rTensor::as.tensor(tensorX)
  set.seed(1)
  tensorX <- rTensor::cp(tnsr = tensorX, num_components = K, max_iter = maxIter, tol = maxError)
  tX <- tensorX$est@data[,,,1]
  for(i in seq_len(nNet)[-1]){
    tX <- tX +  tensorX$est@data[,,,i]
  }
  tX <- tX/nNet
  tX <- tX/max(abs(tX))
  tX <- round(tX,1)
  tX <- as(tX, 'dgCMatrix')
  rownames(tX) <- colnames(tX) <- sGenes

Actually I have tried tensorDecomposition with 3d array, i.e., simly remove the 3rd dim; and replace the "1" in the 3rd to other values. In the former senario I got different results compared to your codes, and the latter case there was an error:

Error in as(tX, "dgCMatrix") :
no method or default for coercing “array” to “dgCMatrix”

Thanks in advance.

@dosorio
Copy link
Member

dosorio commented Sep 7, 2020

Hi @ccshao,

Thanks for using scTenifoldNet. In the beginning, we tried to decompose the tensor with the two sets of networks at once, and established the structure of the code as a 4 order tensor. Then, our collaborators in statistics suggested that decomposing the tensor indepently for each set of networks was a most appropriate procedure. We keep the former structure and let the other dimension empty. I have modified the function to compute the 3d order tensor for you and I will check it out to see how it does affect the results of scTenifoldNet.

tensor3Decomposition <- function(xList, yList = NULL, K = 5, maxError = 1e-5, maxIter = 1e3){
  xNets <- length(xList)
  if(!is.null(yList)){
    yNets <- length(yList)
    if(xNets != yNets){
      stop('Same number of networks are required in both cases')
    }
    nNet <- unique(c(xNets, yNets))
    xGenes <- unique(unlist(lapply(xList, rownames)))
    yGenes <- unique(unlist(lapply(yList, rownames)))
    sGenes <- intersect(xGenes, yGenes)
  } else {
    nNet <- xNets
    xGenes <- unique(unlist(lapply(xList, rownames)))
    sGenes <- xGenes
  }
  nGenes <- length(sGenes) 
  tensorX <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  if(!is.null(yList)){
    tensorY <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  }
  
  for(i in seq_len(nNet)){
    tempX <- matrix(0, nGenes, nGenes)
    rownames(tempX) <- colnames(tempX) <- sGenes
    temp <- as.matrix(xList[[i]])
    tGenes <- sGenes[sGenes %in% rownames(temp)]
    tempX[tGenes,tGenes] <- temp[tGenes,tGenes]
    tensorX[,,i] <- tempX
    
    if(!is.null(yList)){
      tempY <- matrix(0, nGenes, nGenes)
      rownames(tempY) <- colnames(tempY) <- sGenes
      temp <- as.matrix(yList[[i]])
      tGenes <- sGenes[sGenes %in% rownames(temp)]
      tempY[tGenes,tGenes] <- temp[tGenes,tGenes]
      tensorY[,,i] <- tempY
    }
    
  }
  set.seed(1)
  tensorX <- scTenifoldNet:::as.tensor(tensorX)
  tensorX <- scTenifoldNet:::cpDecomposition(tnsr = tensorX, num_components = K, max_iter = maxIter, tol = maxError)
  tX <- tensorX$est$data[,,1]
  for(i in seq_len(nNet)[-1]){
    tX <- tX +  tensorX$est$data[,,i]
  }
  tX <- tX/nNet
  tX <- tX/max(abs(tX))
  tX <- round(tX,1)
  tX <- as(tX, 'dgCMatrix')
  rownames(tX) <- colnames(tX) <- sGenes
  
  if(!is.null(yList)){
    set.seed(1)
    tensorY <- as.tensor(tensorY)
    tensorY <- cpDecomposition(tnsr = tensorY, num_components = K, max_iter = 1e3)
    tY <- tensorY$est$data[,,1]
    for(i in seq_len(nNet)[-1]){
      tY <- tY +  tensorY$est$data[,,i]
    }
    tY <- tY/nNet
    tY <- tY/max(abs(tY))
    tY <- round(tY,1)
    tY <- as(tY, 'dgCMatrix')  
    rownames(tY) <- colnames(tY) <- sGenes
  }
  
  tensorOutput <- list()
  tensorOutput$X <- tX
  if(!is.null(yList)){
    tensorOutput$Y <- tY
  }
  return(tensorOutput)
}


Please let me know if there is something else I can help you with,

Best,

Daniel

@dosorio
Copy link
Member

dosorio commented Sep 7, 2020

Hi @ccshao

Thanks for pointing this out. I have tested the effect on the results of scTenifoldNet. I found that the identified genes as significant are the same, and the Pearson correlation coefficient between the distances is 0.941. Please see the code below:

library(devtools)
load_all('.')

nCells = 2000
nGenes = 100
set.seed(1)
X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98)
X <- round(X)
X <- matrix(X, ncol = nCells)
rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10))

Y <- X
Y[10,] <- Y[50,]
Y[2,] <- Y[11,]
Y[3,] <- Y[5,]

outputHA_td3 <- scTenifold3Net(X = X, Y = Y,
                          nc_nNet = 10, nc_nCells = 500,
                          td_K = 3, qc_minLibSize = 30)

outputHA_td4 <- scTenifoldNet(X = X, Y = Y,
                            nc_nNet = 10, nc_nCells = 500,
                            td_K = 3, qc_minLibSize = 30)

D1 <- outputHA_td3$diffRegulation$distance
names(D1) <- outputHA_td3$diffRegulation$gene

D2 <- outputHA_td4$diffRegulation$distance
names(D2) <- outputHA_td4$diffRegulation$gene

D2 <- D2[names(D1)]

cor(D1,D2)

png('tensorDimension.png', width = 1500, height = 1500, res = 300)
par(mar=c(3,3,1,1), mgp=c(1.5,0.5,0))
gCol <- ifelse(outputHA_td3$diffRegulation$p.adj < 0.1 | outputHA_td4$diffRegulation$p.adj < 0.1, 'red', 'black')
plot(D2,D1, xlab = 'Tensor 4D', ylab = 'Tensor 3D', pch = 16, col = gCol)
abline(0,1, col = 'red', lty = 2)
dev.off()

Example

@ccshao
Copy link
Author

ccshao commented Sep 8, 2020

@dosorio Thanks very much for the quick updates, indeed it seems the choice of 4d/3d array makes little difference in comparing two networks.

However, when I compared two networks from a single X datasets from the 4d/3d approaches, cp gives quite different structures indicated with low correlation value. In this scenario, which network is preferred?

Note I used the cran version 1.0.0, but the tensorDecomposition2 is nearly identical to your codes in tensor3Decomposition.

The codes used to produce a single network with either 3d or 4d.

library(scTenifoldNet) #- cran version 1.0.0
source("./tensorDecomposition2.R") #- see codes at the end.

nCells = 2000
nGenes = 100
set.seed(1)
X <- rnbinom(n = nGenes * nCells, size = 20, prob = 0.98)
X <- round(X)
X <- matrix(X, ncol = nCells)
rownames(X) <- c(paste0('ng', 1:90), paste0('mt-', 1:10))

mnOutput <- makeNetworks(X = X,
                         nNet = 10,
                         nCells = 500,
                         nComp = 3,
                         scaleScores = TRUE,
                         symmetric = FALSE,
                         q = 0.95
                         )
#- networks from cp on 4d array
mat1 <- as.matrix(tensorDecomposition(mnOutput, K = 3, maxError = 1e5, maxIter = 1e3)$X)
#- networks from cp on 3d array
mat2 <- as.matrix(tensorDecomposition2(mnOutput, K = 3, maxError = 1e5, maxIter = 1e3)$X)

cor(c(mat1), c(mat2))
0.2541093
hist(sapply(seq_len(ncol(mat1)), function(x) cor(mat1[, x], mat2[, x])), breaks = 50, xlab = "cor", main = "")

Screen Shot 2020-09-08 at 18 03 49

#- CP with 3d array

tensorDecomposition2 <- function(xList, yList = NULL, K = 5, maxError = 1e-5, maxIter = 1e3){
  xNets <- length(xList)
  if(!is.null(yList)){
    yNets <- length(yList)
    if(xNets != yNets){
      stop('Same number of networks are required in both cases')
    }
    nNet <- unique(c(xNets, yNets))
    xGenes <- unique(unlist(lapply(xList, rownames)))
    yGenes <- unique(unlist(lapply(yList, rownames)))
    sGenes <- intersect(xGenes, yGenes)
  } else {
    nNet <- xNets
    xGenes <- unique(unlist(lapply(xList, rownames)))
    sGenes <- xGenes
  }
  nGenes <- length(sGenes)

  tensorX <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  if(!is.null(yList)){
    tensorY <- array(data = 0, dim = c(nGenes,nGenes,nNet))
  }

  for(i in seq_len(nNet)){
    tempX <- matrix(0, nGenes, nGenes)
    rownames(tempX) <- colnames(tempX) <- sGenes
    temp <- as.matrix(xList[[i]])
    tGenes <- sGenes[sGenes %in% rownames(temp)]
    tempX[tGenes,tGenes] <- temp[tGenes,tGenes]
    tensorX[,,i] <- tempX

    if(!is.null(yList)){
      tempY <- matrix(0, nGenes, nGenes)
      rownames(tempY) <- colnames(tempY) <- sGenes
      temp <- as.matrix(yList[[i]])
      tGenes <- sGenes[sGenes %in% rownames(temp)]
      tempY[tGenes,tGenes] <- temp[tGenes,tGenes]
      tensorY[,,i] <- tempY
    }

  }

  tensorX <- rTensor::as.tensor(tensorX)
  set.seed(1)
  tensorX <- rTensor::cp(tnsr = tensorX, num_components = K, max_iter = maxIter, tol = maxError)
  tX <- tensorX$est@data[,,1]
  for(i in seq_len(nNet)[-1]){
    tX <- tX +  tensorX$est@data[,,i]
  }
  tX <- tX/nNet
  tX <- tX/max(abs(tX))
  tX <- round(tX,1)
  tX <- as(tX, 'dgCMatrix')
  rownames(tX) <- colnames(tX) <- sGenes

  if(!is.null(yList)){
    tensorY <- rTensor::as.tensor(tensorY)
    set.seed(1)
    tensorY <- rTensor::cp(tnsr = tensorY, num_components = K, max_iter = 1e3)
    tY <- tensorY$est@data[,,1]
    for(i in seq_len(nNet)[-1]){
      tY <- tY +  tensorY$est@data[,,,i]
    }
    tY <- tY/nNet
    tY <- tY/max(abs(tY))
    tY <- round(tY,1)
    tY <- as(tY, 'dgCMatrix')
    rownames(tY) <- colnames(tY) <- sGenes
  }

  tensorOutput <- list()
  tensorOutput$X <- tX
  if(!is.null(yList)){
    tensorOutput$Y <- tY
  }
  return(tensorOutput)
}

@dosorio
Copy link
Member

dosorio commented Sep 9, 2020

Hi @ccshao,

We have tested and benchmarked the 4d configuration against 11 other packages using Beeline and it ranks in first tied with PIDC for the network construction. Please see the attached figure: Example

@ccshao
Copy link
Author

ccshao commented Sep 9, 2020

@dosorio would you mind to share the benchmark codes used in Beeline for the 12 methods? It would be interesting to see how the cp 3d array works.

@dosorio
Copy link
Member

dosorio commented Sep 10, 2020

Hey @ccshao, Sure. All the required code is available at: https://github.com/cailab-tamu/scTenifoldNet/tree/master/inst/benchmarking/Beeline

Best,

Daniel

@ccshao
Copy link
Author

ccshao commented Sep 12, 2020

Thanks! I will try to play with different settings of scTenifoldNet.
BTW, rTensor packages has been replaces with in-house codes, What is the key differences comparing to functions related to rTensor?

@dosorio
Copy link
Member

dosorio commented Sep 14, 2020

Hi @ccshao, after the first release of scTenifoldNet, we received a notification from CRAN that the rTensor package was no longer maintained, and they planned to shut it down. We added the function as native in our package and added the citation to the original package. Nothing else changed.

@ccshao
Copy link
Author

ccshao commented Sep 14, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants