Skip to content

Is it possible to map plate-based methods without demultiplex first? #1613

@dbrg77

Description

@dbrg77

Hello,

I have a feature request.

The plate-based one-cell-per-well format is still very useful, e.g. SMART-seq2/3 etc. I know that for this type of method, we could demultiplex each cell into individual fastq files based on the indexing reads, and provide a manifest to STARsolo.

However, with the help of automation and robotics, even plate-based method can go with thousands of cells per run. It is kind of awkward to split into thousands (sometime over >10k fastq files) of files at the bcl2fastq stage. Therefore, what we normally do now is to just dump all fastq into four files without demultiplexing:

R1.fastq.gz
R2.fastq.gz
I1.fastq.gz
I2.fastq.gz

In this case, each cell can be identified by stitching I1 and I2. Can STARsolo support this format?

In my mind, I'm thinking of the following once we get the data:

# prepare the cell barcodes by combine I1 and I2
paste <(zcat I1.fastq.gz) <(zcat I2.fastq.gz) | \
    awk -F '\t' '{ if(NR%4==1||NR%4==3) {print $1} else {print $1 $2} }' | \
    gzip > cell_barcodes.fastq.gz

Then, I can feed the R1.fastq.gz, R2.fastq.gz and cell_barcodes.fastq.gz to STARsolo. The "whitelist" can be very easily generated using the primers used in the experiment.

I actually realised that the SMART-seq3 public data is already in the un-demultiplexed format: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-8735/samples/

So, I guess people do need this option especially when thousands of cells are done.

Thank you very much !!

Regards,
Xi

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions