Skip to content

Problematic behaviors around BAM index generation #316

@athos

Description

@athos

Currently, cljam's BAM writer supports simultaneous writing of a BAM file and its index file. This feature can be enabled by specifying true as the second argument of cljam.io.sam/writer. There are two problematic behaviors around this:

  1. Index file writing is only performed if SO:coordinate is specified in the BAM file header
    • If SO:coordinate is not specified, enabling index file writing is ignored, and no index file is created without any error
  2. The BAM indexer can also create BAM index files although it does not check for SO:coordinate in the BAM file header and always creates an index file
    • This is not only inconsistent with the BAM writer's behavior but also can result in invalid index files if the input BAM file is not sorted

I think the following changes can be made to improve usability and consistency:

  1. If index file writing is enabled for the BAM writer and SO:coordinate is not specified in the BAM file header, an error should be raised to tell the user that the BAM writer didn't succeed in generating an index file
  2. The BAM indexer should also check for SO:coordinate in the input BAM file header and raise an error if SO:coordinate is not specified, ensuring consistency with the BAM writer

Note that these changes will introduce breaking changes in that they will cause errors in situations where no errors were found previously.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions