-
-
Notifications
You must be signed in to change notification settings - Fork 24
Add support for 'meta' alignment operations (OP_PAD
and OP_HARD_CLIP
)
#64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for putting in the effort to add this - huge win!
I don't feel confident in the thoroughness of my review, but I don't see anything that looks obviously bad, and the tests still pass so I'm good with this. I would love for someone else that uses this functionality more regularly to take a look as well, though.
@jakobnissen, @CiaranOMara, could I get a review from one of you? |
Codecov ReportBase: 87.60% // Head: 88.20% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #64 +/- ##
==========================================
+ Coverage 87.60% 88.20% +0.59%
==========================================
Files 16 16
Lines 1138 1178 +40
==========================================
+ Hits 997 1039 +42
+ Misses 141 139 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. β View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jakobnissen, we should postpone merging this until #44 and #72 are both in |
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
According to the SAM spec, "H can only be present as the first and/or last operation." In addition, it's not even classified as an insert operation anymore. Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
β¦tion Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
The test suite included logic for treating ends of alignment differently, but due to the loop condition, this logic was always false. Remove that logic entirely. Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
The `count_aligned` function was updated by @alyst to use the new `alnpos` field in cc8237c ("count_aligned(): optimize with alnpos"). That optimization assumes that all anchor types consume either sequence or reference positions as they consume alignment positions, which isn't true for meta-operations. Switch back to the old way, which does not make any assumptions about how many bases each operation consumes. Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
Co-authored-by: Kevin Bonham <kevbonham@gmail.com>
Signed-off-by: Thomas A. Christensen II <25492070+MillironX@users.noreply.github.com>
20f1856
to
96f8ee7
Compare
Failed downstream is expected: this is a breaking change due to the inclusion of #44. |
Types of changes
This PR implements the following changes:
π Additional detail
Adds support for the
OP_PAD
operation, and fixes the implementation of theOP_HARD_CLIP
operation. I have dubbed these two operations 'meta' operations, as they contain information about the origin and context of the alignment, but have no bearing on the content (i.e. reference or query) of the alignment.New features
ismetaop
, which returnstrue
if passedOP_PAD
orOP_HARD_CLIP
OP_PAD
operation no longer throws an errorOP_HARD_CLIP
orOP_SOFT_CLIP
in the wrong placeAccording to the SAM specification, these have to be present at the end of an alignment, with soft clips optionally buffering hard clips from the rest of the alignment
Changed functionality
cigar(::Alignment)
now uses theAlignmentAnchor.alnpos
to print operation lengthOP_HARD_CLIP
is no longer considered an insert operation, andisinsertop(OP_HARD_CLIP)
now returnsfalse
OP_PAD
indicates it's supported now(I know I said this above, but it's probably the most likely addition to break something, even though it does conform to spec)
Justification
BioAlignments should be able to handle all of the operations defined by the SAM specification. As it currently is, BioAlignments is unable even to parse the alignments in Section 1.1 "An example" of the specification. While the work on clips may seem out-of-scope for a change adding an operation support, the way clips work and pads work are very similar and needed to be considered together.
Example
Using query sequence "r002" from Section 1.1 of the SAM specification:
BioAlignments 2.0.0
This PR
My testing also indicates that this fixes #56,
BioAlignments 2.0.0
This PR
βοΈ Checklist
docs/src/
.[UNRELEASED]
section of the manually curatedCHANGELOG.md
file for this repository.