Skip to content

Error reading Velocity matrix #774

@romanhaa

Description

@romanhaa

Hi Alex,

Following your question on Twitter I started a comparison for the spliced/unspliced counts from velocyto, STARsolo and kallisto. Unfortunately, when trying to read in the "Velocyto" matrix generated by STARsolo, I'm getting this error:

Matrix::readMM(".../STARsolo/Velocyto/raw/matrix.mtx")
# Error: readMM(): row     values 'i' are not in 1:nr

Any idea what the issue could be? The mtx file is ~1 GB in size.

Using scanpy's read_mtx function works fine.

import scanpy as sc
sc.read_mtx(".../STARsolo/Velocyto/raw/matrix.mtx")
# AnnData object with n_obs × n_vars = 11720 × 60609

For those running into the same problem, you can load the .mtx file generated by STARsolo and then save it again using scanpy/scipy functions as shown below. Afterwards, it will be readable by Matrix::readMM() in R.

import scanpy as sc
import scipy
t = sc.read_mtx('.../STARsolo/Velocyto/raw/matrix.mtx')
scipy.io.mmwrite('.../STARsolo/Velocyto/raw/matrix_new_format.mtx', t.X)
Matrix::readMM(".../STARsolo/Velocyto/raw/matrix_new_format.mtx") %>% str()
# Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
#   ..@ i       : int [1:48570060] 1 1 1 1 1 1 1 1 1 1 ...
#   ..@ j       : int [1:48570060] 6962 54995 69331 89585 121706 145759 228188 337667 383554 383896 ...
#   ..@ Dim     : int [1:2] 60609 6794880
#   ..@ Dimnames:List of 2
#   .. ..$ : NULL
#   .. ..$ : NULL
#   ..@ x       : num [1:48570060] 0 0 0 0 0 0 0 0 0 0 ...
#   ..@ factors : list()

Best,
Roman

# R version 3.6.1 (2019-07-05)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Debian GNU/Linux 9 (stretch)
# 
# Matrix products: default
# BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#  [1] Matrix_1.2-17   forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3    
#  [5] purrr_0.3.3     readr_1.3.1     tidyr_1.0.0     tibble_2.1.3   
#  [9] ggplot2_3.2.1   tidyverse_1.2.1
# 
# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.2       cellranger_1.1.0 pillar_1.4.2     compiler_3.6.1  
#  [5] tools_3.6.1      zeallot_0.1.0    jsonlite_1.6     lubridate_1.7.4 
#  [9] lifecycle_0.1.0  gtable_0.3.0     nlme_3.1-141     lattice_0.20-38 
# [13] pkgconfig_2.0.3  rlang_0.4.1      cli_1.1.0        rstudioapi_0.10 
# [17] haven_2.1.1      withr_2.1.2      xml2_1.2.2       httr_1.4.1      
# [21] generics_0.0.2   vctrs_0.2.0      hms_0.5.2        grid_3.6.1      
# [25] tidyselect_0.2.5 glue_1.3.1       R6_2.4.0         readxl_1.3.1    
# [29] modelr_0.1.5     magrittr_1.5     backports_1.1.5  scales_1.0.0    
# [33] rvest_0.3.4      assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3   
# [37] lazyeval_0.2.2   munsell_0.5.0    broom_0.5.2      crayon_1.3.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions