-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to incorporate additional metadata from provided SAM header file #95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the test
profile, this is how the BAM file is modified:
$ samtools view -H mMelMel3.merge.bam
@HD VN:1.6 SO:coordinate
@SQ SN:OV277441.1 LN:7999920
@RG ID:m64094_200910_173211 PL:PACBIO SM:mMelMel3
@RG ID:m64094_200911_174739 PL:PACBIO SM:mMelMel3
@PG ID:minimap2 PN:minimap2 VN:2.24-r1122 CL:minimap2 -ax map-hifi --cs=short -R @RG\tID:m64094_200910_173211\tPL:PACBIO\tSM:mMelMel3 -I1.0G -t 2 -a GCA_922984935.2.subset.unmasked.fasta mMelMel3_T3.fastq.gz
@PG ID:samtools PN:samtools PP:minimap2 VN:1.18 CL:samtools sort -@ 2 -o mMelMel3_T3.bam
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.17 CL:samtools merge --threads 0 -c -p mMelMel3.merge.bam 1/mMelMel3_T3.bam 2/mMelMel3_T4.bam
@PG ID:samtools.2 PN:samtools PP:samtools.1 VN:1.20 CL:/usr/local/bin/samtools view -H mMelMel3.merge.bam
$ samtools view -H mMelMel3.reheader.bam
@HD VN:1.0 SO:unsorted
@SQ SN:OV277441.1 LN:7999920 M5:0457acf8690429f1c98ee545cb8573b8 UR:https://tolit.cog.sanger.ac.uk/test-data/Meles_meles/assembly/release/mMelMel3.1_paternal_haplotype/GCA_922984935.2.subset.fasta.gz AS:GCA_922984935.2 AN:SUPER_1 SP:Meles meles
@PG ID:samtools PN:samtools VN:1.17 CL:samtools reheader GCA_922984935.2.subset.header.sam -
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.20 CL:/usr/local/bin/samtools view -H mMelMel3.reheader.bam
The issues are:
@HD SO
is turned fromcoordinate
tounsorted
even though the file should still be sorted@HD VN
is downgraded from 1.6 to 1.0. Is that necessary ? I'd be concerned that other parts of the headers aren't 1.0 compatible, which might break parsers.- The
@RG
and original@PG
are gone
By the way, check the description of TOLIT-1916. It has some sample code from Rob Davies (Sanger staff, in the Samtools team) to update the @SQ
headers of a BAM file. He makes a new *.header.sam
from the original BAM headers and the user-supplied headers, and then samtools reheader
is called just once on that.
Ah I should have caught that... The new version only replaces @sq lines, and also maintains the input order of header entries from the bam file:
|
Co-authored-by: Matthieu Muffato <cortexspam-github@yahoo.fr>
Co-authored-by: Matthieu Muffato <cortexspam-github@yahoo.fr>
Co-authored-by: Matthieu Muffato <cortexspam-github@yahoo.fr>
Co-authored-by: Matthieu Muffato <cortexspam-github@yahoo.fr>
Co-authored-by: Matthieu Muffato <cortexspam-github@yahoo.fr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super, thank you !
Added a new step just before CRAM generation which adds additional metadata to the output alignment files from the optional
--header
argument. The default behaviour is that this replaces the @hd and @sq lines (i.e., keeping any defined read groups etc), which can be defined usingext.args
.Note the extra header file for now is stored in
assets/
until I can get the hosted test_data updated (added in Jira as a separate ticket)PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).