-
-
Notifications
You must be signed in to change notification settings - Fork 24
Add alignment string position support #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The failures due to |
Some other notes on the PR:
|
I don't know who would be the best to review, but since you were committing recently, maybe you can take a look, @benjward ? Or maybe you can suggest a more suitable reviewer? Thanks in advance! |
@benjward @jakobnissen If you would have time, could you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for just getting around to it now. I can't wrap my head around the logic, but despite a few minor issues, it looks good.
I'm not 100% sure what this functionality is useful for, though. Does the concept of an "alignment position" make sense?
src/BioAlignments.jl
Outdated
seq2agn, | ||
ref2agn, | ||
agn2seq, | ||
agn2ref, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the rest of the codebase, "alignment" is abbreviated aln
. Perhaps a search-replace for "agn"->"aln" would be best.
Map a position `i` from sequence to reference. | ||
""" | ||
function seq2ref(aln::Alignment, i::Integer)::Tuple{Int,Operation} | ||
idx = findanchor(aln, i, Val{true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idiomatically, you usually use Val
by doing findanchor(aln, i, Val(true))
, and then in the type signature that uses Val
, you say
findanchor(aln::Alignment, i::Integer, ::Val{my_bool}) where my_bool
, i.e. using ::Val{Foo}
instead of ::Type{Val{Foo}}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was never sure whether Val(...)
generates a code to allocate that object or not, so to avoid allocation I was using type approach, but I'll change it to Val()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no allocation - the two things are completely equivalent. It's just the norm to use instances instead of types whenever you have a zero-field struct. (e.g. Julia's Base and the rest of BioJulia does it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, that was not my code. %) In the refactored functions Val
is replaced by passing function references.
|
||
Map a position `i` from sequence to reference. | ||
""" | ||
seq2ref(aln::Alignment, i::Integer) = pos2pos(aln, i, seqpos, refpos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - no variable named seqpos
/ refpos
src/alignment.jl
Outdated
return j | ||
end | ||
# invariant (activate this for debugging) | ||
#@assert anchors[lo].$pos < i ≤ anchors[hi].$pos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this assert be re-enabled? Otherwise it should be deleted, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be still relevant, because the function logic was not touched, I've just added support for aligned position + made it a normal function, not @generated
.
I'd rather keep it here as a comment: it nicely documents the constraint of the algorithm, but the context might be performance-critical to enable it.
I'll update it to use the current way of getting the position (pos(anchors...))
) and check internally (without pushing here) whether it doesn't throw in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I checked and tests do not throw if the line is enabled.
@@ -113,7 +113,7 @@ function Base.convert(::Type{Char}, op::Operation) | |||
end | |||
|
|||
function Base.print(io::IO, op::Operation) | |||
write(io, convert(Char, op)) | |||
write(io, isvalid(op) ? convert(Char, op) : '?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support invalid operations? Perhaps it's a better design to not allow invalid operations to exist. I'm not sure though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time has passed, I don't know why I've added it. Maybe when I'll be revisiting the PR, I'll see why.
Otherwise I agree, it could be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep this change:
- to be in sync with
show()
(see below) that doesn't throw - in general, if you cannot print incorrect object, it's hard to figure out what is the problem. If we just throw an exception the user would not see the context (sequence position).
For multiple sequence alignments it definitely does, because it's actually the column of MSA. But it also does make sense for pairwise one. If you want to display the pairwise alignment, you may need to know what is the position of a specific letter from either sequence in the aligned sequence. |
@jakobnissen It looks like at the moment CI seems broken. Should this package be updated to use GitHab actions for CI? |
Yeah... BioJulia is strapped for dev manpower at the moment, so there is a lot of cleanup and housekeeping that needs to be done in all sorts of packages, and just haven't been done. If you make a PR, I'll merge it! You can look at e.g. BioSequences for how to do this. |
otherwise Anchors with invalid operation (and these anchors are created temporarily during traceback) would throw upon printing
Codecov Report
@@ Coverage Diff @@
## master #44 +/- ##
==========================================
- Coverage 88.82% 88.41% -0.42%
==========================================
Files 17 17
Lines 1056 1079 +23
==========================================
+ Hits 938 954 +16
- Misses 118 125 +7
Continue to review full report at Codecov.
|
- add position getters for AlignmentAnchor - convert findanchor() from generated function to normal one with position getter as a param (in Julia constant propagation we trust) - generic pos2pos() function that gets called by seq2ref/ref2seq()
@alyst Ping me when it's ready to be merged :) |
- add alnpos to each anchor - add mapping to/from alnpos - add alnpos support to traceback
@jakobnissen Thank you! I think the PR is good to go. |
Thanks for putting in the work! |
Downstream tests are failing because the breaking change from #44 was merged here.
Alignment string position support
Types of changes
This PR implements the following changes:
📋 Additional detail
Mapping positions from/to the input or reference sequence onto the common alignment string is quite useful (e.g. when comparing annotations attached to specific positions of the aligned strings). In addition to
seq2ref()
andref2seq()
, this PR adds 4 new functions to the public API:seq2agn
,ref2agn
,agn2seq
,agn2ref
, which allow to convert between input/reference/alignment positions in any direction.Technically, keeping track of alignment positions is implemented as an
AlignmentAnchor.agnpos
field. Updating the pairwise alignment algorithms to support alignment positions was easy, only the common traceback utility macros had to be changed. It also should not affect the performance or memory consumption.I also did a few minor cleanups (
@inbounds
were beneficial, refactored position conversion to be more generic and using Julia 1.x approach (avoid generated functions, rely on constant propagation, argument splatting)).Provide a runnable example of use of your addition. This lets reviewers
and others try out the feature before it is merged or makes it's way to release.
☑️ Checklist
docs/src/
.[UNRELEASED]
section of the manually curatedCHANGELOG.md
file for this repository.