-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Labels
Description
When reading alignments from a BAM or CRAM file and writing them to another BAM/CRAM file as they are, the values of integer tags may change.
Repro
$ samtools view int_tag_overflow.bam
r1 4 * 0 0 * * 0 0 ATGC #### XA:i:4294967295(require '[cljam.io.sasm :as sam])
(with-open [r (sam/reader "int_tag_overflow.bam")]
(doall (sam/read-alignments r)))
;=>
({:qname "r1",
:flag 4,
:rname "*",
...
:seq "ATGC",
:qual "####",
:options ({:XA {:type "i", :value 4294967295}})})
(with-open [r (sam/reader "int_tag_overflow.bam")
w (sam/writer "int_tag_overflow.rewrite.bam")]
(sam/write-header w (sam/read-header r))
(sam/write-refs w (sam/read-refs r))
(sam/write-alignments w (sam/read-alignments r) (sam/read-header r)))
(with-open [r (sam/reader "int_tag_overflow.rewrite.bam")]
(doall (sam/read-alignments r)))
;=>
({:qname "r1",
:flag 4,
:rname "*",
...
:seq "ATGC",
:qual "####",
:options ({:XA {:type "i", :value -1}})}) ;; <- this value has changed from the original oneCause
- The SAM format defines the only integer tag type
i(signed arbitrary-precision integer) while the BAM/CRAM format has theiinteger tag type with different semantics (signed 32bit integer), as well as other integer types (c/C/s/S/I) - cljam's BAM/CRAM reader interprets any integer tag value as the
itag type - cljam's BAM/CRAM writer doesn't check if each integer tag value fits the specified tag type. It writes a tag value as the
itag type even if it can't be represented as a signed 32bit integer.