Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about unassigned reads when running in alignment mode #952

Open
zxl124 opened this issue Aug 14, 2024 · 1 comment
Open

Question about unassigned reads when running in alignment mode #952

zxl124 opened this issue Aug 14, 2024 · 1 comment

Comments

@zxl124
Copy link

zxl124 commented Aug 14, 2024

This is not a bug but more of a question. I've run Salmon in alignment mode with transcriptome BAM file generated by STAR. The BAM file contains no unaligned reads. My question is there are often a small number of reads that were not assigned to any rich equivalence class. I am trying to understand what these reads are. I notice that this only happens when the input is paired-end reads. I suspect maybe the unassigned reads are dovetail paired end reads, but I don't know. The --allowDovetail option is not available in alignment mode. Here is an excerpt of the log:

Completed first pass through the alignment file.
Total # of mapped reads : 6205189
# of uniquely mapped reads : 1718004
# ambiguously mapped reads : 4487185

[2024-08-14 18:21:52.491] [jointLog] [info] Computed 350358 rich equivalence classes for further processing
[2024-08-14 18:21:52.491] [jointLog] [info] Counted 6192944 total reads in the equivalence classes

As you can see 6192944 out of 6205189 reads were assigned to rich equivalence classes.
It would be nice to know what the excluded reads are, and/or if there are options to rescue these reads, similar to --allowDovetail.
This is Salmon version 1.10.3, but I also ran older version, which generated same results.

@zxl124
Copy link
Author

zxl124 commented Aug 15, 2024

I realized that most of these unassigned reads are probably paired-end reads that didn't match the specified the libType, which was "IU", or inward, not stranded. So I ran samtools stats on my BAM file to verify that.

SN      inward oriented pairs:  6191674
SN      outward oriented pairs: 13515

The inward pairs 6191674 is close to the pairs Salmon assigned, which was 6192944, but not the same. That's OK, considering Salmon and samtools probably have different ways of defining inward, outward read pairs.
I think it's helpful if Salmon can say in the log how many reads were excluded, for what reason. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant