Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnotateBamWithUmis, sorted or unsorted BAMs? code example? #879

Open
sofiapuvogelvittini opened this issue Oct 13, 2022 · 3 comments
Open
Assignees
Labels

Comments

@sofiapuvogelvittini
Copy link

Dear fgbio team,
I would like to use AnnotateBamWithUmis, however I am not very sure about the steps to take. I couldn't find any tutorial that refers to AnnotateBamWithUmis.
I have paired end RNA seq data, with three fastq files per samples: R1 is the forward read, R3 the reverse read and the UMI is indicated in a separate R2 read file.
I already aligned the R1 and R2 to the genome using Hisat2 and created a SAM file per sample. Then, with samtools view I created a bam file per sample, so I would like to use AnnotateBamWithUmis to annotate these bam files.
My questions are the following:

  1. Should I sort the BAM files using samtools sort before annotating them with AnnotateBamWithUmis? Or should I use AnnotateBamWithUmis first and then sort the annotated bam files?
  2. Do you have any example code to use AnnotateBamWithUmis?
    Thank you very much for your help.
    With best regards,
    Sofia
@nh13
Copy link
Member

nh13 commented Oct 13, 2022

  1. Please let us know how the usage (fgbio AnnotateBamWithUmis --help) could be improved.
  2. If the FASTQ reads are in the same order as the BAM, then use the --sorted to indicate that as such. If they are not, all the FASTQ reads will be read into memory (needs lots of memory for large FASTQs). I don't think it matters if you sort before or after, unless you happen to sort it into the same order as the FASTQ (unlikely).
  3. Do you have any example code to use AnnotateBamWithUmis? It should be as simple as fgbio AnnotateBamWithUmis --input in.bam --fastq R2.fastq.gz --output out.bam

@nh13 nh13 added the question label Oct 13, 2022
@nh13 nh13 self-assigned this Oct 13, 2022
@sofiapuvogelvittini
Copy link
Author

sofiapuvogelvittini commented Oct 14, 2022

Thank you very much for your prompt response,
Regarding question 1: How can I make sure that the FASTQ reads are in the same order as the BAM reads?
For example: I'm checking the first few lines in the bam file (before sorting with samtools sort) and in the fastq file, and they both share the same order of sequence identifiers (at least in the first lines, attached images 1 and 2; Fastq_file and UNsorted_bam_file). Does this mean I can use AnnotateBamWithUmis with the --sorted option?
On the other hand, if I compare the order of the sequence identifiers in the sorted_bam file (obtained using samtools sort), it differs from the fastq file (image 3; samtools_sorted_bam_file). Therefore, would it be the best option to annotate the unsorted bam file with AnnotateBamWithUmis --sorted, and then use samtools sort on the already annotated bam file?

Extra question: How should I icnlude the --sorted option? If I do fgbio AnnotateBamWithUmis --input in.bam --fastq R2.fastq.gz --output out.bam --sorted true, I obtain the following error "No option found with name sorted"

Thanks you very very much for your time and kind help.

Fastq_file

UNsorted_bam_file

samtools_sorted_bam_file

@nh13
Copy link
Member

nh13 commented Oct 17, 2022

My apologies for a delay in response as I wont have time in the near future to look at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants