Skip to content

Read/write invariance breaks with integer tags in BAM/CRAM #314

@athos

Description

@athos

When reading alignments from a BAM or CRAM file and writing them to another BAM/CRAM file as they are, the values of integer tags may change.

Repro

$ samtools view int_tag_overflow.bam
r1      4       *       0       0       *       *       0       0       ATGC    ####    XA:i:4294967295
(require '[cljam.io.sasm :as sam])

(with-open [r (sam/reader "int_tag_overflow.bam")]
  (doall (sam/read-alignments r)))
;=>
({:qname "r1",
  :flag 4,
  :rname "*",
  ...
  :seq "ATGC",
  :qual "####",
  :options ({:XA {:type "i", :value 4294967295}})})

(with-open [r (sam/reader "int_tag_overflow.bam")
            w (sam/writer "int_tag_overflow.rewrite.bam")]
  (sam/write-header w (sam/read-header r))
  (sam/write-refs w (sam/read-refs r))
  (sam/write-alignments w (sam/read-alignments r) (sam/read-header r)))

(with-open [r (sam/reader "int_tag_overflow.rewrite.bam")]
  (doall (sam/read-alignments r)))
;=>
({:qname "r1",
  :flag 4,
  :rname "*",
  ...
  :seq "ATGC",
  :qual "####",
  :options ({:XA {:type "i", :value -1}})})  ;; <- this value has changed from the original one

Cause

  • The SAM format defines the only integer tag type i (signed arbitrary-precision integer) while the BAM/CRAM format has the i integer tag type with different semantics (signed 32bit integer), as well as other integer types (c/C/s/S/I)
  • cljam's BAM/CRAM reader interprets any integer tag value as the i tag type
  • cljam's BAM/CRAM writer doesn't check if each integer tag value fits the specified tag type. It writes a tag value as the i tag type even if it can't be represented as a signed 32bit integer.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions