Description
Hi there
I am using minimap2 to align the microbiome reads against the host genome, to separate the host reads and to use the unmapped reads for my downstream microbiome analysis.
However my unmapped read number is higher than the input read number.
Below is the commands that I ran:
minimap2 -t 24 -ax map-ont $host $infolder/$filename > $outfolder/$idname.sam
samtools view -@ 24 -f 4 $outfolder/$idname.sam > $outfolder/$idname-nonhost.bam
samtools sort --threads 24 $outfolder/$idname-nonhost.bam > $outfolder/$idname-nonhost-sorted.bam
bamToFastq -i $outfolder/$idname-nonhost-sorted.bam -fq $outfolder/$idname-nonhost.fastq
here is the output determined by Nanoplot
<style> </style>Sample ID | input # Reads (K) | input Total Bases (Mb) | unmapped # Reads (K) | unmapped Total Bases (Mb) | unmapped read perct | unmapped bases perct |
---|---|---|---|---|---|---|
3815 | 1514.504 | 2114.67753 | 1836.906 | 2919.44312 | 121.29% | 138.06% |
3816 | 1815.924 | 2024.97881 | 1479.988 | 2250.97226 | 81.50% | 111.16% |
I would appreciate if you could explain why that is so and if there is any way to achieve my plan to separate the host reads while retaining the number of unmapped (potentially microbiome) reads. thank you!