findEncodingDim issues #39

lemdcock · 2022-06-14T15:48:47Z

Dear Gagneur Lab,

I'm using OutRider for aberrant gene expression analysis. However, I'm experiencing issues with the findEncodingDim function. I use the following code:

### Pre-processing ###
ods <- filterExpression(ods, minCounts=TRUE, filterGenes=TRUE, savefpkm=FALSE)

# Controlling for Confounders
ods <- estimateSizeFactors(ods)

# calculate optimal q
q <- findEncodingDim(ods, params = encDimParams)

This code results in the following error message:

Tue Jun 14 17:27:00 2022: SizeFactor estimation ...
Tue Jun 14 17:27:13 2022: Controlling for confounders ...
Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: There are genes without any read. Please filter first the data with: ods <- filterExpression(ods)

Which is odd, since I truly use the filterExpression function prior to calling findEncodingDim. I've also checked this manually with the following code:

Test <- ods@assays@data$counts
Test <- as.data.frame(Test)
Test2 <- Test[rowSums(Test[]) > 0]

This gives the following result:

> dim(Test)
[1] 267812     38
> dim(Test2)
[1] 267812     38

There thus don't appear to be any rows left with all zero counts. Would you potentially know why the error is raised then? And how I can solve it?

Kind regards,
Laurenz De Cock

The text was updated successfully, but these errors were encountered:

c-mertes · 2022-06-15T07:47:17Z

Dear @lemdcock,

thanks for reporting this. You might have hit a bug we just discovered in the injection of outliers. With a size of just 38 samples, it could be that a single gene has only reads in 1 sample, which is fine for the filtering step, but within the injection, it could be that we inject the outlier with 0 that now all the samples have 0 reads.

To check if this could happen. Can you please post the outcome of:

table(rowSums(Test > 0) == 1)

We will fix the bug in the meantime.

Looking also at your dimensions (260k genes), are you dealing with gene expression data? We usually have 20-30k expressed genes in our datasets for bulk RNA-seq samples.

c-mertes · 2022-06-15T07:49:32Z

#37 should fix the issue.

lemdcock · 2022-06-21T16:14:49Z

Dear @c-mertes,

I've run the code you posted. It gives the following outcome:

> table(rowSums(Test > 0) == 1)

 FALSE   TRUE 
253925  13887

As you expected, I had quite a lot of genes with only reads in 1 sample. I will try the solution mentioned under #37.

Thank you for your help!

In my analysis I'm indeed not only looking at just genes. We have expanded the list of known genes with some additional regions of interest for our analysis.

Kind regards,
Laurenz De Cock

c-mertes · 2022-06-22T09:17:38Z

This is great to hear. Let me know if #37 fixed your problem.

vyepez88 · 2023-02-28T09:36:06Z

Hi @lemdcock , were you able to check this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

findEncodingDim issues #39

findEncodingDim issues #39

lemdcock commented Jun 14, 2022

c-mertes commented Jun 15, 2022

c-mertes commented Jun 15, 2022

lemdcock commented Jun 21, 2022

c-mertes commented Jun 22, 2022

vyepez88 commented Feb 28, 2023

findEncodingDim issues #39

findEncodingDim issues #39

Comments

lemdcock commented Jun 14, 2022

c-mertes commented Jun 15, 2022

c-mertes commented Jun 15, 2022

lemdcock commented Jun 21, 2022

c-mertes commented Jun 22, 2022

vyepez88 commented Feb 28, 2023