-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
Hi Constantin,
when the number of design matrix columns increases (and thus the number of samples for a fixed number of replicates), glmGamPoi scales cubically. Consider the following code where 3 genes are fitted with 3 replicates per condition and a varying number of conditions:
library("glmGamPoi")
library("DESeq2")
library("tidyverse")
# adapted from DESeq2
make_example_dds <- function(n_genes, n_replicates, n_conditions){
dispMeanRel <- function(x) 4/x + 0.1
beta <- matrix(rnorm(n_genes*n_conditions, mean = 4, sd = 2), ncol = n_conditions)
dispersion <- dispMeanRel(2^(beta[, 1]))
colData <- DataFrame(condition = factor(rep(paste0("cond", 1:n_conditions), n_replicates)))
x <- model.matrix.default(~colData$condition)
mu <- t(2^(x %*% t(beta)))
countData <- matrix(rnbinom(mu, mu = mu, size = 1/dispersion), ncol = ncol(mu))
mode(countData) <- "integer"
design <- as.formula("~ condition", env = .GlobalEnv)
object <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = design)
object
}
calc_runtime <- function(n_conditions){
set.seed(1)
n_replicates <- 3
dds <- make_example_dds(n_genes = 3, n_replicates = n_replicates, n_conditions = n_conditions)
time <- system.time(
fit <- glm_gp(assay(dds), design = design(dds), col_data = colData(dds), size_factors = rep(1, n_replicates*n_conditions))
)
as.double(time["elapsed"])
}
n_conditions <- c(10,25,seq(50,600,50))
res_glm_gp <- tibble(
n_conditions = n_conditions,
runtime = map_dbl(n_conditions, calc_runtime)
)
ggplot(res_glm_gp, aes(n_conditions, runtime^(1/3))) +
geom_point() +
geom_smooth(method="lm", se=F)Could this be related to the QR decomposition of the design matrix and ultimately not be improved?
Metadata
Metadata
Assignees
Labels
No labels
