-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct more GIG examples with combinations of SVs and recombination #33
Comments
For your first point, we want an example where the topology differs underneath the duplicated region. I guess that does require a further SV, but that's not the only constraint, right? |
Actually, it doesn't. A recombination between the duplicated regions would be enough. |
I was planning to just throw on a deletion and inversion in the duplicated region for the first point, then do a recombination after duplication separately for my second point. I think you're right that both will be examples of the same type (where topology differs underneath the duplicated region). |
I think a deletion in one region and an inversion in another might be too simple: you will still have subsets / reorderings of the same descendant sample nodes in each of the duplicated regions. It's recombination that's going to really make it hairy. |
Fair point. I'll see what monstrosities I can come up with by throwing recombination events around! |
Here is a simple but hairy example with two recombinations and a duplication: import GeneticInheritanceGraph as gig
import tests.gigutil as gigutil
# p | c | c_left | c_right | p_left | p_right
interval_data = [
(6, 4, 0, 10, 0, 10),
(6, 5, 0, 10, 0, 10),
(6, 5, 10, 20, 0, 10),
(4, 2, 5, 15, 0, 10),
(5, 2, 0, 5, 0, 5),
(5, 2, 15, 20, 15, 20),
(5, 3, 0, 20, 0, 20),
(2, 0, 0, 12, 0, 12),
(3, 0, 12, 30, 2, 20),
(3, 1, 0, 20, 0, 20)
]
# time | flags
node_data = [
(0, 1),
(0, 1),
(1, 0),
(1, 0),
(2, 0),
(2, 0),
(3, 0),
]
table_group = gig.TableGroup()
table_group.intervals = gigutil.make_intervals_table(interval_data, table_group)
table_group.nodes = gigutil.make_nodes_table(node_data, table_group) Let's try and calculate the tree at We go up from 0 to 2 to 5, then from there we go up to the root and down to 1 (orange path). Going down again from the root takes us to other points along the genome of 0, leading two three copies of 0 in the tree. In the case of the tree at The algorithm in #35 mostly worked for this example, except for missing the blue path in the first tree. I'm going to construct a more complex example where there are more nodes between the duplication and recombination events. |
@duncanMR : I think for illustration purposes we should take the "standard" example below: Then swap the viz layout so that the deletion is on the left and the duplication in the middle, then recombine nodes 1 and 2, and also 1 and 4. After relabelling the sample nodes as 0..4, I think the GIG (with additional recombination nodes) is as below. I wondered if you could have a go at visualising it using the same sort of approach as in the picture above? I've changed the inversion coordinates to shift them a bit leftwards # time | flags
node_data = [
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(1, 0), # new "RE" node
(1, 0), # new "RE" node
(2, 0),
(2, 0),
(2, 0),
(3, 0),
(3, 0),
(4, 0),
(5, 0),
]
tables = gigl.Tables()
tables.nodes = make_nodes_table(node_data, tables)
tables.iedges.add_rows(
[
iedge(0, 200, 0, 200, c=12, p=13),
iedge(0, 200, 0, 200, c=11, p=12),
iedge(0, 200, 0, 200, c=10, p=12),
# inversion
iedge(0, 20, 0, 20, c=9, p=13),
iedge(20, 120, 120, 20, c=9, p=13),
iedge(120, 200, 120, 200, c=9, p=13),
# duplication
iedge(0, 200, 0, 200, c=8, p=11),
iedge(200, 300, 100, 200, c=8, p=11),
# deletion
iedge(0, 50, 0, 50, c=7, p=10),
iedge(50, 100, 150, 200, c=7, p=10),
# recombination reinstates original coords
# because the SVs are no longer genetically ancestral to node 6 and its descendants
iedge(0, 150, 0, 150, c=6, p=8),
iedge(150, 200, 150, 200, c=6, p=9),
# recombination combines two SVs
iedge(0, 70, 0, 70, c=5, p=7),
iedge(70, 200, 170, 300, c=5, p=8),
iedge(0, 200, 0, 200, c=4, p=9), # unrecombined
iedge(0, 200, 0, 200, c=3, p=6), # recombined, 200bp in original coordinate space
iedge(0, 300, 0, 300, c=2, p=8), # unrecombined
iedge(0, 200, 0, 200, c=1, p=5), # recombined, 200bp from concatenated dupe & del
iedge(0, 100, 0, 100, c=0, p=7), # unrecombined
]
) So something very roughly like this, but with a nicer graphical output rather than text annotation that I've temporarily bunged in here for the recombination intervals / edges. We probably don't need mutations on there yet, but useful to know they can be shown. Note that we have a trifurcation below node 8: I don't know if that will confuse some people? |
And as a follow-up, what would be really good is if we can modify the example (by adding more recombinations?) so that two of the samples produce an MRCA result that looks something like https://docs.google.com/presentation/d/1cti1Xpaz1liQ0xB7Wd6tCWvFurydN7MxGpNbH0VMYnk/ It may be that the GIG to produce something like this is too complex to display in a tidy plot, however. |
We can break this into 2 bifurcations and create a node with a missing region due to recombination, if we allow 5 and 6 to merge before both merge with the ancestor of 2. This would however cause a line-crossing in the graph viz. |
I'm also wondering if it would be better to label the top of the graph with node 0 and the samples as x..N? Then we have comparable node numbers for the with- and without- recombination versions. Node numbers with the oldest at 0 is what we create with a forward simulation too but it's less usual (but allowed) in tskit tree sequences. |
Here's what I'm now suggesting in #89 . I don't know if this suits the viz: @pytest.fixture(scope="session")
def all_sv_types_no_re_gig():
"""
Contains a single deletion, a single duplication, and a single inversion.
See https://github.com/hyanwong/GeneticInheritanceGraphLibrary/issues/2
"""
# time | flags
node_data = [
(6, 0),
(5, 0),
(4, 0),
(4, 0),
(2, 0),
(2, 0),
(2, 0),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
]
tables = gigl.Tables()
tables.nodes = make_nodes_table(node_data, tables)
tables.iedges.add_rows(
[
iedge(0, 200, 0, 200, c=1, p=0),
iedge(0, 200, 0, 200, c=2, p=1),
iedge(0, 200, 0, 200, c=3, p=1),
iedge(0, 50, 0, 50, c=4, p=2),
iedge(50, 100, 150, 200, c=4, p=2),
iedge(0, 200, 0, 200, c=5, p=3),
iedge(200, 300, 100, 200, c=5, p=3),
iedge(0, 20, 0, 20, c=6, p=0),
iedge(20, 120, 120, 20, c=6, p=0),
iedge(120, 200, 120, 200, c=6, p=0),
iedge(0, 100, 0, 100, c=7, p=4),
iedge(0, 100, 0, 100, c=8, p=4),
iedge(0, 300, 0, 300, c=9, p=5),
iedge(0, 300, 0, 300, c=10, p=5),
iedge(0, 200, 0, 200, c=11, p=6),
iedge(0, 200, 0, 200, c=12, p=6),
]
)
return gigl.Graph(tables)
@pytest.fixture(scope="session")
def all_sv_types_re_gig():
# time | flags
node_data = [
(6, 0),
(5, 0),
(4, 0),
(4, 0),
(3, 0),
(3, 0),
(3, 0),
(2, 0),
(1, 0), # new "RE" node
(1, 0), # new "RE" node
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
(0, gigl.NODE_IS_SAMPLE),
]
tables = gigl.Tables()
tables.nodes = make_nodes_table(node_data, tables)
tables.iedges.add_rows(
[
iedge(0, 200, 0, 200, c=1, p=0),
iedge(0, 200, 0, 200, c=2, p=1),
iedge(0, 200, 0, 200, c=3, p=1),
# deletion
iedge(0, 50, 0, 50, c=4, p=2),
iedge(50, 100, 150, 200, c=4, p=2),
# duplication
iedge(0, 200, 0, 200, c=5, p=3),
iedge(200, 300, 100, 200, c=5, p=3),
# inversion
iedge(0, 20, 0, 20, c=6, p=0),
iedge(20, 120, 120, 20, c=6, p=0),
iedge(120, 200, 120, 200, c=6, p=0),
# Extra coalescent node for duplication
iedge(0, 300, 0, 300, c=7, p=5),
# recombination combines two SVs
iedge(0, 70, 0, 70, c=8, p=4),
iedge(70, 200, 170, 300, c=8, p=7),
# recombination reinstates original coords
iedge(0, 150, 0, 150, c=9, p=7),
iedge(150, 200, 150, 200, c=9, p=6),
iedge(0, 100, 0, 100, c=10, p=4), # unrecombined
iedge(0, 200, 0, 200, c=11, p=8),
iedge(0, 300, 0, 300, c=12, p=5), # unrecombined
iedge(0, 200, 0, 200, c=13, p=9),
iedge(0, 200, 0, 200, c=14, p=6), # unrecombined
]
)
tables.sort()
return gigl.Graph(tables) |
After discussion with @duncanMR , we decided that it would also be nice for viz purposes to have a GIG with only one recombination node (e.g. the left hand one in the example above). So I think we could define both a |
We already have a toy example with all three types of SV (#2), but it would be helpful to have more. In particular, it would be interesting to have GIGs:
@hyanwong, if you have other example ideas let me know! I'll add them to the test fixtures for easy access.
The text was updated successfully, but these errors were encountered: