Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running velocyto on multiple samples with 10X #54

Open
dingying85 opened this issue Nov 15, 2018 · 16 comments
Open

Running velocyto on multiple samples with 10X #54

dingying85 opened this issue Nov 15, 2018 · 16 comments

Comments

@dingying85
Copy link

Dear Velocyto team,

I have a question about running velocyto on eight 10x samples. Since 10X pipeline gives bam file for each sample, I can run velocyto on each bam file one by one and generate eight loom files. I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples? I would like to have some advise on how to run velocyto and achieve that map efficiently?

Yours,
Ying DIng

@Puriney
Copy link
Contributor

Puriney commented Nov 24, 2018

I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?

Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.

Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:

merged_exon_mat <- matrix()
merged_intron_mat <- matrix()
merged_spanning_mat <- matrix()
for rds_x in c(rds1, rds2, ..., rds8):
    mat <- read.rds(rds_x)
    mat_exon <- mat$exon
    mat_intron <- mat$intron
    mat_spanning <- mat$spanning
    ## avoid collision of same cell barcode
    colnames(mat_exon) <- paste(rds_x, colnames(mat_exon))  
    colnames(mat_intron) <- paste(rds_x, colnames(mat_intron))
    colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning))
    merged_exon_mat <- cbind(merged_exon_mat, mat_exon)
    merged_intron_mat <- cbind(merged_intron_mat, mat_intron)
    merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning)

merged_mats <- list(
    exon=merged_exon_mat,
    intron=merged_intron_mat,
    spanning=merged_spanning_mat)
## use merged_mats to run velocyto.R
## ...

You could probably do the same operation on loom file, though I never give a try.

A final suggestion would be considering the batch effect. Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species'). (Update: the integrated values are no longer UMIs so don't do this.)

@zqzneptune
Copy link

Hi, @Puriney ,
I would like to follow up this issue.
Does Velocity.R take logCounts instead of counts matrix for down stream estimation? Will this alternative would affect the computing of Velocity then?

The reason is that after the batch effect correction has been applied, the resultant output came with logcounts only.

Thanks!

@Puriney
Copy link
Contributor

Puriney commented May 9, 2019

I don't think it is good to use logCounts for the input. Because velocity.R internally relies on raw counts. Here are some examples:

  • cell size calculation
    # size estimates
    if(is.null(emat.size)) { emat.size <- Matrix::colSums(emat); }
    if(is.null(nmat.size)) { nmat.size <- Matrix::colSums(nmat); }
    emat.cs <- emat.size[colnames(emat)]/mult;
    nmat.cs <- nmat.size[colnames(nmat)]/mult;
  • Use the log normalized count matrix to define the k nearest neighbors of cells for 'smoothing'
    emat.log.norm <- log(as.matrix(t(t(emat)/emat.cs))+pcount);
    if(!is.null(old.fit)) { cellKNN <- old.fit[['cellKNN']]}
    knn.maxl <- 1e2
    if(kCells>1) {
    if(is.null(cellKNN)) {
    cat("calculating cell knn ... ")
    if(is.null(cell.dist)) {
    cellKNN <- balancedKNN(emat.log.norm,kCells,kCells*knn.maxl,n.threads=n.cores);
    } else {
    cellKNN <- balancedKNN(emat.log.norm,kCells,kCells*knn.maxl,n.threads=n.cores,dist=cell.dist);
  • Use the log normalized count matrix for fitting.
    cat("fitting gamma coefficients ... ")
    if(is.null(old.fit)) {
    ko <- data.frame(do.call(rbind,parallel::mclapply(sn(rownames(conv.emat.norm)),function(gn) {
    if(!is.null(smat)) { # use smat-based offsets

Maybe you can manually convert the logcounts back to raw counts. Read batchelor code to find if they use 10, 2, e and also what is the pseudo-count.

@saeedfc
Copy link

saeedfc commented Oct 22, 2019

@Puriney

Hi,

Regarding your suggestion of Seurat Integration. How do you bring seurat integration here, just by using the UMAP/tSNE from seurat integrated analysis to show velocity in velocity.R?

Kindly advice. It would be great for me and I suppose many to have clarity over this as Seurat Integration is also popular for batch correction.

Thanks in advance!

@chlee-tabin
Copy link

Just if I could chime in, I think https://github.com/satijalab/seurat-wrappers has what you need, and which I am using (I am not starting from .loom file, but essentially it requires only small modification).

The vignette is http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/velocity.html

@saeedfc
Copy link

saeedfc commented Oct 29, 2019

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

@kaizen89
Copy link

kaizen89 commented Feb 7, 2020

I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?

@yaolutian
Copy link

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

This is reasonable

@millersan
Copy link

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

Could you help me to deal with this?

@saeedfc
Copy link

saeedfc commented Jun 28, 2020

I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?

@kaizen89 , As far as I understand, In the end at least in Seurat Integration, you are only using 'correction' to cluster and for dimensionality reduction. Your 'expression values' are not per se batch corrected.

Velocyto is using the original 10x data to make loom output file. You are only super imposing the velocyto analysis on the 'corrected' umap?
Please correct me If I am thinking wrong.

@saeedfc
Copy link

saeedfc commented Jun 28, 2020

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

Could you help me to deal with this?

Sorry, what error message do you get?
What is the size of your data?

@XYZuo
Copy link

XYZuo commented Jul 19, 2020

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous
Could you help me to deal with this?

Sorry, what error message do you get?
What is the size of your data?

Hi,
I tried but got "Error: Cannot add a different number of cells than already present"
And It could run without reasonable results.
Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells?
Thanks a lot!

@saeedfc
Copy link

saeedfc commented Jul 19, 2020

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous
Could you help me to deal with this?

Sorry, what error message do you get?
What is the size of your data?

Hi,
I tried but got "Error: Cannot add a different number of cells than already present"
And It could run without reasonable results.
Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells?
Thanks a lot!

Hi @ZxyChopcat

Your SO should have the same number of cells. Before making the assay objects, filter your seura object to contain only cells that are also present int eh integrated object


bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)
#Import the original seurat object where I integrated 4 different 10x runs using seurat v3
SO <- readRDS("Integrated_seurat_object.rds")
spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

It is also a good practice to check the number of cells frequently in you r seurat objects using dim(Seurat.Object)

Hope it works!
Saeed

@JerryZhang-1222
Copy link

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

Thank you for providing a very clear idea, but I think we should notice that cell ids from loom file are not exactily the same format as 10x cellranger output files under most conditions. Thus, before running GetAssayObject, we should run RenameCells() to make sure that cell names from VelocytoObj and 10xSeuratObj are exactly the same.
Thanks again.
Here are my codes:

load(file = "./tmp/Vlobj_orig.Rdata") #this is the SeuratObj from loom files post-merged

#define a function to transfer loom id to a new id according to the original SeuratObj
#which need to be modified according to your own cell ids
TransferNames = function(loom_names){
  loom_names.tmp <- gsub(":","_",loom_names)
  loom_names.new <- paste0(substring(loom_names.tmp,1,(str_length(loom_names.tmp)-1)),"-1")
  return(loom_names.new)
}
Vlobj <- RenameCells(Vlobj,
                     new.names = TransferNames(colnames(Vlobj)))

#create data objects
spliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "unspliced"))
ambiguous <- CreateAssayObject(GetAssayData(Vlobj, assay = "ambiguous"))

#add into original SeuratObj
snRNA_har[["spliced"]] <- spliced
snRNA_har[["unspliced"]] <- unspliced
snRNA_har[["ambiguous"]] <- ambiguous

@curryfly5
Copy link

I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?

Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.

Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:

merged_exon_mat <- matrix()
merged_intron_mat <- matrix()
merged_spanning_mat <- matrix()
for rds_x in c(rds1, rds2, ..., rds8):
    mat <- read.rds(rds_x)
    mat_exon <- mat$exon
    mat_intron <- mat$intron
    mat_spanning <- mat$spanning
    ## avoid collision of same cell barcode
    colnames(mat_exon) <- paste(rds_x, colnames(mat_exon))  
    colnames(mat_intron) <- paste(rds_x, colnames(mat_intron))
    colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning))
    merged_exon_mat <- cbind(merged_exon_mat, mat_exon)
    merged_intron_mat <- cbind(merged_intron_mat, mat_intron)
    merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning)

merged_mats <- list(
    exon=merged_exon_mat,
    intron=merged_intron_mat,
    spanning=merged_spanning_mat)
## use merged_mats to run velocyto.R
## ...

You could probably do the same operation on loom file, though I never give a try.

A final suggestion would be considering the batch effect. Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species'). (Update: the integrated values are no longer UMIs so don't do this.)

Thanks for the code!But I found another problem while running------ Matrices must have same number of rows in cbind2(.Call(dense_to_Csparse, x), y).

@yanpinlu
Copy link

SO <- readRDS("7.16second.fix/merge_only3/merge.3.seurat.7.16.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
Error: Cannot add a different number of cells than already present

Using this method, if I have filtered the cells in advance, how can I solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests