Skip to content

HDF5/each individual transcript molecule info in STARSolo output #1148

@mbatiuk

Description

@mbatiuk

Hi,

First of all, thanks for making STARSolo.

Is there any way to get HDF5 as output from STARSolo pipeline after mapping/counting 10x droplet data?
Or any other way to get each individual transcript molecule information

HDF5 is one of the standard outputs in 10x cellranger, here is the description of the format:
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices

HDF5 file is needed for certain downstream tools, such as swappedDrops in DropletUtils. Here is the description:
#MarioniLab/DropletUtils#59

For example, swappedDrops removes counts that could be artificially generated due to sample barcode swapping while sequencing single cell libraries on Illumina patterned flow cells. This is known problem when reads from one sample in multiplexed sequencing run could appear as reads from another sample: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6039488/

And swappedDrops needs information on molecule-level: UMI, assigned gene and assigned cell for each individual transcript molecule; and this is provided in HDF5 file by cellranger.

While developers of swappedDrops informed that HDF5 is not a strict requirement, other file types providing molecule info will do

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions