-
Notifications
You must be signed in to change notification settings - Fork 544
Description
Hi,
First of all, thanks for making STARSolo.
Is there any way to get HDF5 as output from STARSolo pipeline after mapping/counting 10x droplet data?
Or any other way to get each individual transcript molecule information
HDF5 is one of the standard outputs in 10x cellranger, here is the description of the format:
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices
HDF5 file is needed for certain downstream tools, such as swappedDrops in DropletUtils. Here is the description:
#MarioniLab/DropletUtils#59
For example, swappedDrops removes counts that could be artificially generated due to sample barcode swapping while sequencing single cell libraries on Illumina patterned flow cells. This is known problem when reads from one sample in multiplexed sequencing run could appear as reads from another sample: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6039488/
And swappedDrops needs information on molecule-level: UMI, assigned gene and assigned cell for each individual transcript molecule; and this is provided in HDF5 file by cellranger.
While developers of swappedDrops informed that HDF5 is not a strict requirement, other file types providing molecule info will do