Skip to content

Potential bug in TSSEnrichment #783

@ajwilk

Description

@ajwilk

Hi there, thanks for building and maintaining such a great package. Running into strange behavior of TSSEnrichment(): have a Seurat object built from 4 10X multiome samples (points to 4 different fragment files). When I call TSSEnrichment() on the object, the TSS enrichment score for all cells of the first 3 samples are zero, and there's a reasonable distribution of scores for the 4th and last sample.

I don't think this is a problem with annotation because it appears the score is being calculated for at least some of the cells and the same annotation is used for all samples.

Code used to generate the object:

### find input files, build object ###
matrix_path <- "my_path/"
count_files <- list.files(path = matrix_path, pattern = "filtered_feature_bc_matrix.h5$")
frag_files <- list.files(path = matrix_path, pattern = "atac_fragments.tsv.gz$")
sample_names <- gsub("_filtered_feature_bc_matrix.h5","",count_files)

counts <- lapply(paste0(matrix_path,count_files), Read10X_h5)

seu_list <- pblapply(seq_along(1:length(counts)), FUN = function(x) {
  pbmc <- CreateSeuratObject(
    counts = counts[[x]]$`Gene Expression`,
    assay = "RNA"
  )
  
  pbmc[["ATAC"]] <- CreateChromatinAssay(
    counts = counts[[x]]$Peaks,
    sep = c(":", "-"),
    fragments = paste0(matrix_path,frag_files[x]),
    annotation = annotation
  )
  return(pbmc)
})

### call peaks on fragment files (a topic for another post perhaps) ###
peaks <- CallPeaks(paste0(matrix_path,frag_files), macs2.path = "macs2_path")
peaks <- keepStandardChromosomes(peaks, pruning.mode = "coarse")
peaks <- subsetByOverlaps(x = peaks, ranges = blacklist_hg38_unified, invert = TRUE)

### re-quantify with new peak set and merge ###
seu_list <- pblapply(seq_along(1:length(seu_list)), FUN = function(x) {
  soi <- seu_list[[x]]
  #create fragment object
  frag_object <- CreateFragmentObject(path = paste0(matrix_path,frag_files[x]),
                                      cells = colnames(soi))
  
  #quantify counts in each peak
  macs2_counts <- FeatureMatrix(frag_object,
                                features = peaks,
                                cells = colnames(soi))
  
  soi[["ATAC"]] <- CreateChromatinAssay(macs2_counts, 
                                      fragments = frag_object, 
                                      annotation = annotation)
  
  return(soi)
})

seu_combined <- merge(x = new_seu_list[[1]], y = new_seu_list[2:length(new_seu_list)],
                      add.cell.ids = sample_names)

The output of the TSSEnrichment call:

DefaultAssay(seu_combined) <- "ATAC"
seu_combined <- TSSEnrichment(seu_combined)

Extracting TSS positions
Extracting fragments at TSSs
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03m 51snger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement lengthlonger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement length
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03m 35snger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement lengthlonger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement length
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02m 59snger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement lengthlonger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement length
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03m 21snger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement lengthlonger object length is not a multiple of shorter object lengthnumber of items to replace is not a multiple of replacement length

Computing TSS enrichment score

I'm guessing the warning above points to the source of the problem. And as described above the only non-zero TSS enrichment scores appear in the last sample.

aggregate(seu_combined$TSS.enrichment, by = list(seu_combined$dataset), mean)
     Group.1       x
1 VWT064_15A 0.00000
2 VWT064_15C 0.00000
3 VWT064_15F 0.00000
4  VWT064_16 5.02145

Would love to hear your thoughts! Hopefully I haven't made some mistake in the generation of the object that's causing this. Let me know if any other info would be helpful and I'm willing to share the object in case that's helpful.

Update
Calling TSSEnrichment() on each individual object before merging leads to the expected non-zero values for all samples

seu_list <- pblapply(seq_along(1:length(seu_list)), FUN = function(x) {
  soi <- seu_list[[x]]
  #create fragment object
  frag_object <- CreateFragmentObject(path = paste0(matrix_path,frag_files[x]),
                                      cells = colnames(soi))
  
  #quantify counts in each peak
  macs2_counts <- FeatureMatrix(frag_object,
                                features = peaks,
                                cells = colnames(soi))
  
  soi[["ATAC"]] <- CreateChromatinAssay(macs2_counts, 
                                      fragments = frag_object, 
                                      annotation = annotation)
  
  DefaultAssay(soi) <- "ATAC"
  soi <- NucleosomeSignal(soi)
  soi <- TSSEnrichment(soi)
  return(soi)
})

seu_combined <- merge(x = seu_list[[1]], y = seu_list[2:length(seu_list)],
                      add.cell.ids = sample_names)

aggregate(seu_combined$TSS.enrichment, by = list(seu_combined$dataset), mean)
     Group.1        x
1 VWT064_15A 5.110414
2 VWT064_15C 4.969871
3 VWT064_15F 5.121808
4  VWT064_16 5.021383

Sessioninfo

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] SeuratDisk_0.0.0.9019             pbapply_1.4-3                    
 [3] biovizBase_1.38.0                 BSgenome.Hsapiens.UCSC.hg38_1.4.3
 [5] BSgenome_1.58.0                   rtracklayer_1.50.0               
 [7] Biostrings_2.58.0                 XVector_0.30.0                   
 [9] EnsDb.Hsapiens.v86_2.99.0         ensembldb_2.14.1                 
[11] AnnotationFilter_1.14.0           GenomicFeatures_1.42.3           
[13] AnnotationDbi_1.52.0              Biobase_2.50.0                   
[15] GenomicRanges_1.42.0              GenomeInfoDb_1.26.2              
[17] IRanges_2.24.1                    S4Vectors_0.28.1                 
[19] BiocGenerics_0.36.0               Signac_1.2.1.9002                
[21] SeuratObject_4.0.0                Seurat_4.0.2                     
[23] forcats_0.5.0                     stringr_1.4.0                    
[25] purrr_0.3.4                       readr_2.0.0                      
[27] tidyr_1.1.2                       tibble_3.0.5                     
[29] ggplot2_3.3.3                     tidyverse_1.3.0                  
[31] dplyr_1.0.5                       plyr_1.8.6                       

loaded via a namespace (and not attached):
  [1] reticulate_1.18             tidyselect_1.1.0            RSQLite_2.2.2              
  [4] htmlwidgets_1.5.3           grid_4.0.3                  docopt_0.7.1               
  [7] BiocParallel_1.24.1         Rtsne_0.15                  munsell_0.5.0              
 [10] codetools_0.2-18            ica_1.0-2                   future_1.21.0              
 [13] miniUI_0.1.1.1              withr_2.4.0                 colorspace_2.0-0           
 [16] knitr_1.30                  rstudioapi_0.13             ROCR_1.0-11                
 [19] tensor_1.5                  listenv_0.8.0               labeling_0.4.2             
 [22] MatrixGenerics_1.2.0        slam_0.1-48                 GenomeInfoDbData_1.2.4     
 [25] polyclip_1.10-0             bit64_4.0.5                 farver_2.0.3               
 [28] parallelly_1.23.0           vctrs_0.3.6                 generics_0.1.0             
 [31] xfun_0.20                   BiocFileCache_1.14.0        lsa_0.73.2                 
 [34] ggseqlogo_0.1               R6_2.5.0                    hdf5r_1.3.3                
 [37] DelayedArray_0.16.0         bitops_1.0-6                spatstat.utils_2.1-0       
 [40] assertthat_0.2.1            promises_1.1.1              scales_1.1.1               
 [43] nnet_7.3-14                 gtable_0.3.0                globals_0.14.0             
 [46] goftest_1.2-2               rlang_0.4.10                RcppRoll_0.3.0             
 [49] splines_4.0.3               lazyeval_0.2.2              dichromat_2.0-0            
 [52] checkmate_2.0.0             spatstat.geom_2.1-0         broom_0.7.3                
 [55] yaml_2.2.1                  reshape2_1.4.4              abind_1.4-5                
 [58] modelr_0.1.8                backports_1.2.1             httpuv_1.5.5               
 [61] Hmisc_4.4-2                 tools_4.0.3                 ellipsis_0.3.1             
 [64] spatstat.core_2.1-2         RColorBrewer_1.1-2          ggridges_0.5.3             
 [67] Rcpp_1.0.6                  base64enc_0.1-3             progress_1.2.2             
 [70] zlibbioc_1.36.0             RCurl_1.98-1.2              prettyunits_1.1.1          
 [73] rpart_4.1-15                openssl_1.4.3               deldir_0.2-9               
 [76] cowplot_1.1.1               zoo_1.8-8                   SummarizedExperiment_1.20.0
 [79] haven_2.3.1                 ggrepel_0.9.1               cluster_2.1.0              
 [82] fs_1.5.0                    magrittr_2.0.1              RSpectra_0.16-0            
 [85] data.table_1.13.6           scattermore_0.7             lmtest_0.9-38              
 [88] reprex_0.3.0                RANN_2.6.1                  SnowballC_0.7.0            
 [91] ProtGenerics_1.22.0         fitdistrplus_1.1-3          matrixStats_0.57.0         
 [94] evaluate_0.14               hms_1.0.0                   patchwork_1.1.1            
 [97] mime_0.9                    xtable_1.8-4                XML_3.99-0.5               
[100] jpeg_0.1-8.1                sparsesvd_0.2               readxl_1.3.1               
[103] gridExtra_2.3               compiler_4.0.3              biomaRt_2.46.3             
[106] KernSmooth_2.23-18          crayon_1.3.4                htmltools_0.5.1            
[109] mgcv_1.8-33                 later_1.1.0.1               tzdb_0.1.2                 
[112] Formula_1.2-4               lubridate_1.7.9.2           DBI_1.1.1                  
[115] tweenr_1.0.1                dbplyr_2.1.0                MASS_7.3-53                
[118] rappdirs_0.3.1              Matrix_1.3-2                cli_2.2.0                  
[121] igraph_1.2.6                pkgconfig_2.0.3             GenomicAlignments_1.26.0   
[124] foreign_0.8-81              plotly_4.9.3                spatstat.sparse_2.0-0      
[127] xml2_1.3.2                  rvest_0.3.6                 VariantAnnotation_1.36.0   
[130] digest_0.6.27               sctransform_0.3.2           RcppAnnoy_0.0.18           
[133] spatstat.data_2.1-0         rmarkdown_2.6               cellranger_1.1.0           
[136] leiden_0.3.6                fastmatch_1.1-0             htmlTable_2.1.0            
[139] uwot_0.1.10                 curl_4.3                    shiny_1.5.0                
[142] Rsamtools_2.6.0             lifecycle_1.0.0             nlme_3.1-151               
[145] jsonlite_1.7.2              viridisLite_0.3.0           askpass_1.1                
[148] fansi_0.4.2                 pillar_1.4.7                lattice_0.20-41            
[151] fastmap_1.0.1               httr_1.4.2                  survival_3.2-7             
[154] glue_1.4.2                  qlcMatrix_0.9.7             png_0.1-7                  
[157] bit_4.0.4                   ggforce_0.3.2               stringi_1.5.3              
[160] blob_1.2.1                  latticeExtra_0.6-29         memoise_1.1.0              
[163] irlba_2.3.3                 future.apply_1.7.0   

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions