Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findEncodingDim issues #39

Open
lemdcock opened this issue Jun 14, 2022 · 5 comments
Open

findEncodingDim issues #39

lemdcock opened this issue Jun 14, 2022 · 5 comments

Comments

@lemdcock
Copy link

Dear Gagneur Lab,

I'm using OutRider for aberrant gene expression analysis. However, I'm experiencing issues with the findEncodingDim function. I use the following code:

### Pre-processing ###
ods <- filterExpression(ods, minCounts=TRUE, filterGenes=TRUE, savefpkm=FALSE)

# Controlling for Confounders
ods <- estimateSizeFactors(ods)

# calculate optimal q
q <- findEncodingDim(ods, params = encDimParams)

This code results in the following error message:

Tue Jun 14 17:27:00 2022: SizeFactor estimation ...
Tue Jun 14 17:27:13 2022: Controlling for confounders ...
Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: There are genes without any read. Please filter first the data with: ods <- filterExpression(ods)

Which is odd, since I truly use the filterExpression function prior to calling findEncodingDim. I've also checked this manually with the following code:

Test <- ods@assays@data$counts
Test <- as.data.frame(Test)
Test2 <- Test[rowSums(Test[]) > 0]

This gives the following result:

> dim(Test)
[1] 267812     38
> dim(Test2)
[1] 267812     38

There thus don't appear to be any rows left with all zero counts. Would you potentially know why the error is raised then? And how I can solve it?

Kind regards,
Laurenz De Cock

@c-mertes
Copy link
Contributor

Dear @lemdcock,

thanks for reporting this. You might have hit a bug we just discovered in the injection of outliers. With a size of just 38 samples, it could be that a single gene has only reads in 1 sample, which is fine for the filtering step, but within the injection, it could be that we inject the outlier with 0 that now all the samples have 0 reads.

To check if this could happen. Can you please post the outcome of:

table(rowSums(Test > 0) == 1)

We will fix the bug in the meantime.

Looking also at your dimensions (260k genes), are you dealing with gene expression data? We usually have 20-30k expressed genes in our datasets for bulk RNA-seq samples.

@c-mertes
Copy link
Contributor

#37 should fix the issue.

@lemdcock
Copy link
Author

Dear @c-mertes,

I've run the code you posted. It gives the following outcome:

> table(rowSums(Test > 0) == 1)

 FALSE   TRUE 
253925  13887

As you expected, I had quite a lot of genes with only reads in 1 sample. I will try the solution mentioned under #37.

Thank you for your help!

In my analysis I'm indeed not only looking at just genes. We have expanded the list of known genes with some additional regions of interest for our analysis.

Kind regards,
Laurenz De Cock

@c-mertes
Copy link
Contributor

This is great to hear. Let me know if #37 fixed your problem.

@vyepez88
Copy link
Contributor

Hi @lemdcock , were you able to check this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants