Running velocyto on multiple samples with 10X #54

dingying85 · 2018-11-15T19:48:39Z

Dear Velocyto team,

I have a question about running velocyto on eight 10x samples. Since 10X pipeline gives bam file for each sample, I can run velocyto on each bam file one by one and generate eight loom files. I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples? I would like to have some advise on how to run velocyto and achieve that map efficiently?

Yours,
Ying DIng

Puriney · 2018-11-24T05:22:28Z

I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?

Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.

Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:

merged_exon_mat <- matrix()
merged_intron_mat <- matrix()
merged_spanning_mat <- matrix()
for rds_x in c(rds1, rds2, ..., rds8):
    mat <- read.rds(rds_x)
    mat_exon <- mat$exon
    mat_intron <- mat$intron
    mat_spanning <- mat$spanning
    ## avoid collision of same cell barcode
    colnames(mat_exon) <- paste(rds_x, colnames(mat_exon))  
    colnames(mat_intron) <- paste(rds_x, colnames(mat_intron))
    colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning))
    merged_exon_mat <- cbind(merged_exon_mat, mat_exon)
    merged_intron_mat <- cbind(merged_intron_mat, mat_intron)
    merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning)

merged_mats <- list(
    exon=merged_exon_mat,
    intron=merged_intron_mat,
    spanning=merged_spanning_mat)
## use merged_mats to run velocyto.R
## ...

You could probably do the same operation on loom file, though I never give a try.

A final suggestion would be considering the batch effect. ~~Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species').~~ (Update: the integrated values are no longer UMIs so don't do this.)

zqzneptune · 2019-05-09T18:34:39Z

Hi, @Puriney ,
I would like to follow up this issue.
Does Velocity.R take logCounts instead of counts matrix for down stream estimation? Will this alternative would affect the computing of Velocity then?

The reason is that after the batch effect correction has been applied, the resultant output came with logcounts only.

Thanks!

Puriney · 2019-05-09T18:56:40Z

I don't think it is good to use logCounts for the input. Because velocity.R internally relies on raw counts. Here are some examples:

cell size calculation

velocyto.R/R/momentum_routines.R

Lines 109 to 113 in d779034

    
           # size estimates 
        
           if(is.null(emat.size)) { emat.size <- Matrix::colSums(emat); } 
        
           if(is.null(nmat.size)) { nmat.size <- Matrix::colSums(nmat); } 
        
           emat.cs <- emat.size[colnames(emat)]/mult; 
        
           nmat.cs <- nmat.size[colnames(nmat)]/mult;

Use the log normalized count matrix to define the k nearest neighbors of cells for 'smoothing'

velocyto.R/R/momentum_routines.R

Lines 116 to 125 in d779034

    
           emat.log.norm <- log(as.matrix(t(t(emat)/emat.cs))+pcount); 
        
           if(!is.null(old.fit)) { cellKNN <- old.fit[['cellKNN']]} 
        
           knn.maxl <- 1e2 
        
           if(kCells>1) { 
        
             if(is.null(cellKNN)) { 
        
               cat("calculating cell knn ... ") 
        
               if(is.null(cell.dist)) { 
        
                 cellKNN <- balancedKNN(emat.log.norm,kCells,kCells*knn.maxl,n.threads=n.cores); 
        
               } else { 
        
                 cellKNN <- balancedKNN(emat.log.norm,kCells,kCells*knn.maxl,n.threads=n.cores,dist=cell.dist);

Use the log normalized count matrix for fitting.

velocyto.R/R/momentum_routines.R

Lines 303 to 306 in d779034

    
           cat("fitting gamma coefficients ... ") 
        
           if(is.null(old.fit)) { 
        
             ko <- data.frame(do.call(rbind,parallel::mclapply(sn(rownames(conv.emat.norm)),function(gn) { 
        
               if(!is.null(smat)) { # use smat-based offsets

Maybe you can manually convert the logcounts back to raw counts. Read batchelor code to find if they use 10, 2, e and also what is the pseudo-count.

saeedfc · 2019-10-22T09:35:47Z

@Puriney

Hi,

Regarding your suggestion of Seurat Integration. How do you bring seurat integration here, just by using the UMAP/tSNE from seurat integrated analysis to show velocity in velocity.R?

Kindly advice. It would be great for me and I suppose many to have clarity over this as Seurat Integration is also popular for batch correction.

Thanks in advance!

chlee-tabin · 2019-10-22T14:57:08Z

Just if I could chime in, I think https://github.com/satijalab/seurat-wrappers has what you need, and which I am using (I am not starting from .loom file, but essentially it requires only small modification).

The vignette is http://htmlpreview.github.io/?https://github.com/satijalab/seurat-wrappers/blob/master/docs/velocity.html

saeedfc · 2019-10-29T17:14:11Z

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

kaizen89 · 2020-02-07T17:42:04Z

I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?

yaolutian · 2020-02-19T05:52:56Z

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

This is reasonable

millersan · 2020-06-28T06:25:09Z

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

Could you help me to deal with this?

saeedfc · 2020-06-28T23:57:53Z

I am really questioning the use of batch correction on multiple samples to run velocyto on the corrected matrices. Does anyone have any insight on this?

@kaizen89 , As far as I understand, In the end at least in Seurat Integration, you are only using 'correction' to cluster and for dimensionality reduction. Your 'expression values' are not per se batch corrected.

Velocyto is using the original 10x data to make loom output file. You are only super imposing the velocyto analysis on the 'corrected' umap?
Please correct me If I am thinking wrong.

saeedfc · 2020-06-28T23:58:52Z

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

Could you help me to deal with this?

Sorry, what error message do you get?
What is the size of your data?

XYZuo · 2020-07-19T12:28:43Z

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous
Could you help me to deal with this?

Sorry, what error message do you get?
What is the size of your data?

Hi,
I tried but got "Error: Cannot add a different number of cells than already present"
And It could run without reasonable results.
Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells?
Thanks a lot!

saeedfc · 2020-07-19T17:57:39Z

SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

I tried this, it takes a lot of times but can not get the result.
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous
Could you help me to deal with this?

Sorry, what error message do you get?
What is the size of your data?

Hi,
I tried but got "Error: Cannot add a different number of cells than already present"
And It could run without reasonable results.
Also curious about the reason.Is it because the Integrated_seurat_object already filtered out some cells?
Thanks a lot!

Hi @ZxyChopcat

Your SO should have the same number of cells. Before making the assay objects, filter your seura object to contain only cells that are also present int eh integrated object


bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)
#Import the original seurat object where I integrated 4 different 10x runs using seurat v3
SO <- readRDS("Integrated_seurat_object.rds")
spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

It is also a good practice to check the number of cells frequently in you r seurat objects using dim(Seurat.Object)

Hope it works!
Saeed

JerryZhang-1222 · 2022-03-24T05:50:08Z

@chlee-tabin @Puriney

This is what I did, It would be great if a contributor can comment to confirm.

ldat_obj.1 <- ReadVelocity(file = "obj.1.loom")
obj.1 <- as.Seurat(ldat_obj.1)
.
.
.
ldat_obj.4 <- ReadVelocity(file = "obj.4.loom")
obj.4 <- as.Seurat(ldat_obj.4)

bm <- merge(x = obj.1, y = c(obj.2,obj.3, obj.4), merge.data = TRUE)

spliced <- CreateAssayObject(GetAssayData(bm, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(bm, assay = "unspliced"))
unambiguous <- CreateAssayObject(GetAssayData(bm, assay = "unambiguous"))

#Import the original seurat object where I integrated 4 different 10x runs using seurat v3

SO <- readRDS("Integrated_seurat_object.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
SO[["ambiguous"]] <- ambiguous

SO <- RunVelocity(SO,deltaT = 1, kCells = 25, fit.quantile = 0.02)

It works, but I am not very sure this pipeline stays true with the things that go in the background with velocity.

Thank you for providing a very clear idea, but I think we should notice that cell ids from loom file are not exactily the same format as 10x cellranger output files under most conditions. Thus, before running GetAssayObject, we should run RenameCells() to make sure that cell names from VelocytoObj and 10xSeuratObj are exactly the same.
Thanks again.
Here are my codes:

load(file = "./tmp/Vlobj_orig.Rdata") #this is the SeuratObj from loom files post-merged

#define a function to transfer loom id to a new id according to the original SeuratObj
#which need to be modified according to your own cell ids
TransferNames = function(loom_names){
  loom_names.tmp <- gsub(":","_",loom_names)
  loom_names.new <- paste0(substring(loom_names.tmp,1,(str_length(loom_names.tmp)-1)),"-1")
  return(loom_names.new)
}
Vlobj <- RenameCells(Vlobj,
                     new.names = TransferNames(colnames(Vlobj)))

#create data objects
spliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "spliced"))
unspliced <- CreateAssayObject(GetAssayData(Vlobj, assay = "unspliced"))
ambiguous <- CreateAssayObject(GetAssayData(Vlobj, assay = "ambiguous"))

#add into original SeuratObj
snRNA_har[["spliced"]] <- spliced
snRNA_har[["unspliced"]] <- unspliced
snRNA_har[["ambiguous"]] <- ambiguous

curryfly5 · 2022-05-12T16:20:30Z

I am wondering is there a way to visualize the vector fields on a single TSNE plot of aggregate cells of all eight samples?

Yes, it has. Merge the eight matrices (genes by cells) to one giant matrix, then feed it to velocyto.R.

Take the pure R's pipeline for example, the required input RDS file is generated by dropEst (see here) and it is nothing but a R's 'list' of which the content is a triplets: exon, intron and spanning matrices. Here is the pseudo R codes:
merged_exon_mat <- matrix()
merged_intron_mat <- matrix()
merged_spanning_mat <- matrix()
for rds_x in c(rds1, rds2, ..., rds8):
    mat <- read.rds(rds_x)
    mat_exon <- mat$exon
    mat_intron <- mat$intron
    mat_spanning <- mat$spanning
    ## avoid collision of same cell barcode
    colnames(mat_exon) <- paste(rds_x, colnames(mat_exon))  
    colnames(mat_intron) <- paste(rds_x, colnames(mat_intron))
    colnames(mat_spanning) <- paste(rds_x, colnames(mat_spanning))
    merged_exon_mat <- cbind(merged_exon_mat, mat_exon)
    merged_intron_mat <- cbind(merged_intron_mat, mat_intron)
    merged_spanning_mat <- cbind(merged_spanning_mat, mat_spanning)

merged_mats <- list(
    exon=merged_exon_mat,
    intron=merged_intron_mat,
    spanning=merged_spanning_mat)
## use merged_mats to run velocyto.R
## ...
You could probably do the same operation on loom file, though I never give a try.

A final suggestion would be considering the batch effect. ~~Probably try the Seurat's intergration function if needed ('Integrating single-cell transcriptomic data across different conditions, technologies, and species').~~ (Update: the integrated values are no longer UMIs so don't do this.)

Thanks for the code！But I found another problem while running------ Matrices must have same number of rows in cbind2(.Call(dense_to_Csparse, x), y).

yanpinlu · 2022-07-19T01:58:23Z

SO <- readRDS("7.16second.fix/merge_only3/merge.3.seurat.7.16.rds")
SO[["spliced"]] <- spliced
SO[["unspliced"]] <- unspliced
Error: Cannot add a different number of cells than already present

Using this method, if I have filtered the cells in advance, how can I solve it?

zgyaru mentioned this issue Sep 30, 2020

TypeError: create() got an unexpected keyword argument 'matrix' velocyto-team/velocyto.py#110

Open

dmsalsgh97 mentioned this issue Apr 21, 2021

velocyto on 6 samples with integrated seurat obj. satijalab/seurat-wrappers#89

Open

victorwang123 mentioned this issue Sep 1, 2021

How to combine different samples of LOOM files #148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running velocyto on multiple samples with 10X #54

Running velocyto on multiple samples with 10X #54

dingying85 commented Nov 15, 2018

Puriney commented Nov 24, 2018 •

edited

Loading

zqzneptune commented May 9, 2019

Puriney commented May 9, 2019

saeedfc commented Oct 22, 2019

chlee-tabin commented Oct 22, 2019

saeedfc commented Oct 29, 2019

kaizen89 commented Feb 7, 2020

yaolutian commented Feb 19, 2020

millersan commented Jun 28, 2020

saeedfc commented Jun 28, 2020

saeedfc commented Jun 28, 2020

XYZuo commented Jul 19, 2020

saeedfc commented Jul 19, 2020

JerryZhang-1222 commented Mar 24, 2022

curryfly5 commented May 12, 2022

yanpinlu commented Jul 19, 2022

Running velocyto on multiple samples with 10X #54

Running velocyto on multiple samples with 10X #54

Comments

dingying85 commented Nov 15, 2018

Puriney commented Nov 24, 2018 • edited Loading

zqzneptune commented May 9, 2019

Puriney commented May 9, 2019

saeedfc commented Oct 22, 2019

chlee-tabin commented Oct 22, 2019

saeedfc commented Oct 29, 2019

kaizen89 commented Feb 7, 2020

yaolutian commented Feb 19, 2020

millersan commented Jun 28, 2020

saeedfc commented Jun 28, 2020

saeedfc commented Jun 28, 2020

XYZuo commented Jul 19, 2020

saeedfc commented Jul 19, 2020

JerryZhang-1222 commented Mar 24, 2022

curryfly5 commented May 12, 2022

yanpinlu commented Jul 19, 2022

Puriney commented Nov 24, 2018 •

edited

Loading