Open
Description
Dear @CiaranOMara
I've been working a bit with XAM.jl. Great package, but I've found quite a number of small errors and mistakes I think should be fixed. Making one PR for each would be overwhelming, especially since fixing some of them has implications that are not straightforward. So I'd like to discuss it a bit here:
List of errors/inconsistencies
- Missing qualities are set to 0xff, but hasquality does not check this, so the result may be wrong
- refname should be accessible even if the read is unmapped (per specs section 1.4.3)
- isnextmapped checks if the record is filled, but ismapped doesn't
- hasauxdata doesn't even check if the record has aux data
- quality(record) raises an obscure error if a quality is > 127
- function MetaInfo should check that a tag has two characters per the specs
- ismapped, isprimary and ispositive strand errors when called on an unfilled record, BUT isnextmapped doesn't
- hasrefid, hasrefname and hasposition just returns if the record is filled, not what is actually asked. But hasrightposition behaves differently. Worse, refname behaves differently too, meaning you could have hasrefname() == true, yet it still errors when the refname is fetched.
Proposals/enhancements
- We should just get rid of unfilled records, they make little sense as a concept. Instead, we initialize every record as an empty, but valid record. This way, we never need to check if the record is filled, since it always is.
- Then we can remove e.g. hasflag since every record always have flags
- all hasX function should return a Bool
- if hasX function returns false, the accessor (X) function should error.
- Rewrite the memory layout of the BAM record so it exactly matches the specifications. This will not result in any performance issues, just make the underlying code simpler and make it easier to interface with C in the future
I've implemented many of these changes at https://github.com/jakobnissen/XAM.jl, but that "fork" is stale, and I'd rather merge the fixes into the original XAM.jl
Metadata
Assignees
Labels
No labels