Error in addGeneIntegrationMatrix /  abnormally high number of communities 


When executing the code provided, the following error occurs:


```
> mRCCv5
An object of class Seurat 
22921 features across 102723 samples within 1 assay 
Active assay: RNA (22921 features, 2000 variable features)
 2 layers present: counts, data
> proj

           ___      .______        ______  __    __  .______      
          /   \     |   _  \      /      ||  |  |  | |   _  \     
         /  ^  \    |  |_)  |    |  ,----'|  |__|  | |  |_)  |    
        /  /_\  \   |      /     |  |     |   __   | |      /     
       /  _____  \  |  |\  \\___ |  `----.|  |  |  | |  |\  \\___.
      /__/     \__\ | _| `._____| \______||__|  |__| | _| `._____|
    
class: ArchRProject 
outputDirectory: atac_RCC100 
samples(1): RCC100_ver2
sampleColData names(1): ArrowFiles
cellColData names(16): Sample TSSEnrichment ... BlacklistRatio
  Clusters
numberOfCells(1): 44844
medianTSS(1): 6.249
medianFrags(1): 3221

> proj <- addGeneIntegrationMatrix(
+   ArchRProj = proj, 
+   useMatrix = "GeneScoreMatrix",
+   matrixName = "GeneIntegrationMatrix",
+   reducedDims = "IterativeLSI",
+   seRNA = mRCCv5,
+   sampleCellsRNA = 5000,
+   addToArrow = F,
+   groupRNA = "BioClassification",
+   nameCell = "predictedCell_Un",
+   nameGroup = "predictedGroup_Un",
+   nameScore = "predictedScore_Un",
+   force = T
+ )
```

The single-cell ATAC-seq dataset consists of multiple samples, but the above function was executed only on a single sample, and even when performed on an ArchRProject that combines multiple samples, the same error occurs. I have also tried switching to different Seurat assay versions such as v3, v5, but the same error persists.



```
ArchR logging to : ArchRLogs/ArchR-addGeneIntegrationMatrix-104ce73bf6ad1-Date-2023-07-28_Time-01-19-53.700155.log
If there is an issue, please report to github with logFile!
2023-07-28 01:19:54.010202 : Running Seurat's Integration Stuart* et al 2019, 0.005 mins elapsed.
2023-07-28 01:19:54.041649 : Checking ATAC Input, 0.006 mins elapsed.
2023-07-28 01:19:59.577644 : Checking RNA Input, 0.098 mins elapsed.
Warning: The following arguments are not used: layer
2023-07-28 01:20:16.147696 : Found 17989 overlapping gene names from gene scores and rna matrix!, 0.374 mins elapsed.
2023-07-28 01:20:16.14888 : Creating Integration Blocks, 0.374 mins elapsed.
2023-07-28 01:20:18.104443 : Prepping Interation Data, 0.407 mins elapsed.
2023-07-28 01:20:18.623075 : Computing Integration in 5 Integration Blocks!, 0 mins elapsed.
Error in .safelapply(seq_along(blockList), function(i) { : 
Error Found Iteration 1 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 2 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 3 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : \n  more elements supplied than there are to replace\n"
	<simpleError in slot(object = object, name = "features")[[layer]] <- features: more elements supplied than there are to replace>
Error Found Iteration 4 : 
	[1] "Error in slot(object = object, name = \"features\")[[layer]] <- features : 
In addition: Warning message:
In mclapply(..., mc.cores = threads, mc.preschedule = preschedule) :
  5 function calls resulted in an error
```


Below is the result of executing `sessioninfo()`:

```
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

Random number generation:
 RNG:     L'Ecuyer-CMRG 
 Normal:  Inversion 
 Sample:  Rejection 
 
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
 [1] hexbin_1.28.3                     nabor_0.5.0                      
 [3] uwot_0.1.16                       BSgenome.Hsapiens.UCSC.hg38_1.4.5
 [5] BSgenome_1.68.0                   rtracklayer_1.60.0               
 [7] Biostrings_2.68.1                 XVector_0.40.0                   
 [9] Seurat_4.9.9.9049                 SeuratObject_4.9.9.9085          
[11] sp_1.6-1                          trqwe_0.1                        
[13] rhdf5_2.44.0                      SummarizedExperiment_1.30.1      
[15] Biobase_2.60.0                    MatrixGenerics_1.12.0            
[17] Rcpp_1.0.11                       Matrix_1.6-0                     
[19] GenomicRanges_1.52.0              GenomeInfoDb_1.36.0              
[21] IRanges_2.34.0                    S4Vectors_0.38.1                 
[23] BiocGenerics_0.46.0               matrixStats_1.0.0                
[25] data.table_1.14.8                 stringr_1.5.0                    
[27] plyr_1.8.8                        magrittr_2.0.3                   
[29] ggplot2_3.4.2                     gtable_0.3.3                     
[31] gtools_3.9.4                      gridExtra_2.3                    
[33] ArchR_1.0.2                      

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3       rstudioapi_0.15.0        jsonlite_1.8.5          
  [4] spatstat.utils_3.0-3     farver_2.1.1             BiocIO_1.10.0           
  [7] zlibbioc_1.46.0          vctrs_0.6.3              ROCR_1.0-11             
 [10] Rsamtools_2.16.0         Cairo_1.6-0              spatstat.explore_3.2-1  
 [13] RCurl_1.98-1.12          htmltools_0.5.5          S4Arrays_1.0.4          
 [16] Rhdf5lib_1.22.0          sctransform_0.3.5        parallelly_1.36.0       
 [19] KernSmooth_2.23-22       htmlwidgets_1.6.2        ica_1.0-3               
 [22] plotly_4.10.2            zoo_1.8-12               GenomicAlignments_1.36.0
 [25] igraph_1.5.0             mime_0.12                lifecycle_1.0.3         
 [28] pkgconfig_2.0.3          R6_2.5.1                 fastmap_1.1.1           
 [31] GenomeInfoDbData_1.2.10  fitdistrplus_1.1-11      future_1.33.0           
 [34] shiny_1.7.4.1            digest_0.6.33            colorspace_2.1-0        
 [37] patchwork_1.1.2          tensor_1.5               RSpectra_0.16-1         
 [40] irlba_2.3.5.1            labeling_0.4.2           progressr_0.13.0        
 [43] fansi_1.0.4              spatstat.sparse_3.0-2    polyclip_1.10-4         
 [46] httr_1.4.6               abind_1.4-5              compiler_4.3.0          
 [49] withr_2.5.0              BiocParallel_1.34.1      fastDummies_1.6.3       
 [52] MASS_7.3-60              DelayedArray_0.26.2      rjson_0.2.21            
 [55] tools_4.3.0              lmtest_0.9-40            httpuv_1.6.11           
 [58] future.apply_1.11.0      goftest_1.2-3            glue_1.6.2              
 [61] restfulr_0.0.15          nlme_3.1-162             rhdf5filters_1.12.1     
 [64] promises_1.2.0.1         Rtsne_0.16               cluster_2.1.4           
 [67] reshape2_1.4.4           generics_0.1.3           spatstat.data_3.0-1     
 [70] tidyr_1.3.0              utf8_1.2.3               spatstat.geom_3.2-4     
 [73] RcppAnnoy_0.0.21         ggrepel_0.9.3            RANN_2.6.1              
 [76] pillar_1.9.0             spam_2.9-1               RcppHNSW_0.4.1          
 [79] later_1.3.1              splines_4.3.0            dplyr_1.1.2             
 [82] lattice_0.21-8           deldir_1.0-9             survival_3.5-5          
 [85] tidyselect_1.2.0         miniUI_0.1.1.1           pbapply_1.7-2           
 [88] scattermore_1.2          stringi_1.7.12           yaml_2.3.7              
 [91] lazyeval_0.2.2           codetools_0.2-19         tibble_3.2.1            
 [94] cli_3.6.1                xtable_1.8-4             reticulate_1.30         
 [97] munsell_0.5.0            spatstat.random_3.1-5    globals_0.16.2          
[100] png_0.1-8                XML_3.99-0.14            ellipsis_0.3.2          
[103] dotCall64_1.0-2          bitops_1.0-7             listenv_0.9.0           
[106] viridisLite_0.4.2        scales_1.2.1             ggridges_0.5.4          
[109] leiden_0.4.3             purrr_1.0.1              crayon_1.5.2            
[112] rlang_1.1.1              cowplot_1.1.1         
```



During the execution of doubletScoring, the following message was displayed, and when performing clustering, the number of communities was abnormally high. For the given sample, 130 communities were identified, and 117 singletons were merged into one cluster. In other datasets with 19 samples, the number of communities showed 10365, and even after 13 hours, the addCluster function is still running.
```
 Correlation of UMAP Projection is below 0.9 (normally this is ~0.99)
This means there is little heterogeneity in your sample, and thus doubletCalling is inaccurate. force = FALSE, thus returning -1 doubletScores and doubletEnrichments!
```

The purpose of this issue is to resolve the error in 'addGeneIntegrationMatrix'. Additionally, I would appreciate it if you could suggest appropriate options related to the 'addDoubletScores' function. Thank you.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in addGeneIntegrationMatrix / abnormally high number of communities #1999

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in addGeneIntegrationMatrix / abnormally high number of communities #1999

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions