Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Construct more GIG examples with combinations of SVs and recombination #33

Open
duncanMR opened this issue Oct 17, 2023 · 12 comments
Open
Assignees

Comments

@duncanMR
Copy link
Collaborator

We already have a toy example with all three types of SV (#2), but it would be helpful to have more. In particular, it would be interesting to have GIGs:

  • where a duplicated region has further structural changes not present in the other copy
  • with a recombination event between genomes of different lengths
  • with multiple overlapping inversions

@hyanwong, if you have other example ideas let me know! I'll add them to the test fixtures for easy access.

@duncanMR duncanMR self-assigned this Oct 17, 2023
@hyanwong
Copy link
Owner

For your first point, we want an example where the topology differs underneath the duplicated region. I guess that does require a further SV, but that's not the only constraint, right?

@hyanwong
Copy link
Owner

For your first point, we want an example where the topology differs underneath the duplicated region. I guess that does require a further SV, but that's not the only constraint, right?

Actually, it doesn't. A recombination between the duplicated regions would be enough.

@duncanMR
Copy link
Collaborator Author

For your first point, we want an example where the topology differs underneath the duplicated region. I guess that does require a further SV, but that's not the only constraint, right?

Actually, it doesn't. A recombination between the duplicated regions would be enough.

I was planning to just throw on a deletion and inversion in the duplicated region for the first point, then do a recombination after duplication separately for my second point. I think you're right that both will be examples of the same type (where topology differs underneath the duplicated region).

@hyanwong
Copy link
Owner

hyanwong commented Oct 17, 2023

I think a deletion in one region and an inversion in another might be too simple: you will still have subsets / reorderings of the same descendant sample nodes in each of the duplicated regions. It's recombination that's going to really make it hairy.

@duncanMR
Copy link
Collaborator Author

I think a deletion in one region and an inversion in another might be too simple: you will still have subsets / reorderings of the same descendant sample nodes in each of the duplicated regions. It's recombination that's going to really make it hairy.

Fair point. I'll see what monstrosities I can come up with by throwing recombination events around!

@duncanMR duncanMR changed the title Make more GIG examples with combinations of SVs and recombination Construct more GIG examples with combinations of SVs and recombination Oct 17, 2023
@duncanMR
Copy link
Collaborator Author

Here is a simple but hairy example with two recombinations and a duplication:

image

import GeneticInheritanceGraph as gig
import tests.gigutil as gigutil

# p | c | c_left | c_right | p_left | p_right
interval_data = [
    (6, 4, 0, 10, 0, 10),
    (6, 5, 0, 10, 0, 10),
    (6, 5, 10, 20, 0, 10),
    (4, 2, 5, 15, 0, 10),
    (5, 2, 0, 5, 0, 5),
    (5, 2, 15, 20, 15, 20),
    (5, 3, 0, 20, 0, 20),
    (2, 0, 0, 12, 0, 12),
    (3, 0, 12, 30, 2, 20),
    (3, 1, 0, 20, 0, 20)
]
# time | flags
node_data = [
    (0, 1),
    (0, 1),
    (1, 0),
    (1, 0),
    (2, 0),
    (2, 0),
    (3, 0),
]

table_group = gig.TableGroup()
table_group.intervals = gigutil.make_intervals_table(interval_data, table_group)
table_group.nodes = gigutil.make_nodes_table(node_data, table_group)

Let's try and calculate the tree at $x = 0$:

image

We go up from 0 to 2 to 5, then from there we go up to the root and down to 1 (orange path). Going down again from the root takes us to other points along the genome of 0, leading two three copies of 0 in the tree.

In the case of the tree at $x = 8$, we now have four copies of 0! The duplications are going to get out of hand very quickly with bigger trees.

image

The algorithm in #35 mostly worked for this example, except for missing the blue path in the first tree. I'm going to construct a more complex example where there are more nodes between the duplication and recombination events.

@hyanwong
Copy link
Owner

hyanwong commented Feb 27, 2024

@duncanMR : I think for illustration purposes we should take the "standard" example below:

image

Then swap the viz layout so that the deletion is on the left and the duplication in the middle, then recombine nodes 1 and 2, and also 1 and 4. After relabelling the sample nodes as 0..4, I think the GIG (with additional recombination nodes) is as below. I wondered if you could have a go at visualising it using the same sort of approach as in the picture above? I've changed the inversion coordinates to shift them a bit leftwards

# time | flags
    node_data = [
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (1, 0), # new "RE" node
        (1, 0), # new "RE" node
        (2, 0),
        (2, 0),
        (2, 0),
        (3, 0),
        (3, 0),
        (4, 0),
        (5, 0),
    ]
    tables = gigl.Tables()
    tables.nodes = make_nodes_table(node_data, tables)
    tables.iedges.add_rows(
        [
            iedge(0, 200, 0, 200, c=12, p=13),
            iedge(0, 200, 0, 200, c=11, p=12),
            iedge(0, 200, 0, 200, c=10, p=12),

            # inversion
            iedge(0, 20, 0, 20, c=9, p=13),
            iedge(20, 120, 120, 20, c=9, p=13),
            iedge(120, 200, 120, 200, c=9, p=13),

            # duplication
            iedge(0, 200, 0, 200, c=8, p=11),
            iedge(200, 300, 100, 200, c=8, p=11),

            # deletion
            iedge(0, 50, 0, 50, c=7, p=10),
            iedge(50, 100, 150, 200, c=7, p=10),

            # recombination reinstates original coords
            # because the SVs are no longer genetically ancestral to node 6 and its descendants 
            iedge(0, 150, 0, 150, c=6, p=8),
            iedge(150, 200, 150, 200, c=6, p=9),

            # recombination combines two SVs
            iedge(0, 70, 0, 70, c=5, p=7),
            iedge(70, 200, 170, 300, c=5, p=8),

            iedge(0, 200, 0, 200, c=4, p=9),  # unrecombined
            iedge(0, 200, 0, 200, c=3, p=6),  # recombined, 200bp in original coordinate space
            iedge(0, 300, 0, 300, c=2, p=8),  # unrecombined
            iedge(0, 200, 0, 200, c=1, p=5),  # recombined, 200bp from concatenated dupe & del
            iedge(0, 100, 0, 100, c=0, p=7),  # unrecombined
        ]
    )

So something very roughly like this, but with a nicer graphical output rather than text annotation that I've temporarily bunged in here for the recombination intervals / edges. We probably don't need mutations on there yet, but useful to know they can be shown.

Screenshot 2024-02-27 at 16 08 10

Note that we have a trifurcation below node 8: I don't know if that will confuse some people?

@hyanwong
Copy link
Owner

hyanwong commented Feb 27, 2024

And as a follow-up, what would be really good is if we can modify the example (by adding more recombinations?) so that two of the samples produce an MRCA result that looks something like https://docs.google.com/presentation/d/1cti1Xpaz1liQ0xB7Wd6tCWvFurydN7MxGpNbH0VMYnk/

It may be that the GIG to produce something like this is too complex to display in a tidy plot, however.

@hyanwong
Copy link
Owner

Note that we have a trifurcation below node 8: I don't know if that will confuse some people?

We can break this into 2 bifurcations and create a node with a missing region due to recombination, if we allow 5 and 6 to merge before both merge with the ancestor of 2. This would however cause a line-crossing in the graph viz.

@hyanwong
Copy link
Owner

I'm also wondering if it would be better to label the top of the graph with node 0 and the samples as x..N? Then we have comparable node numbers for the with- and without- recombination versions.

Node numbers with the oldest at 0 is what we create with a forward simulation too but it's less usual (but allowed) in tskit tree sequences.

@hyanwong
Copy link
Owner

Here's what I'm now suggesting in #89 . I don't know if this suits the viz:

@pytest.fixture(scope="session")
def all_sv_types_no_re_gig():
    """
    Contains a single deletion, a single duplication, and a single inversion.
    See https://github.com/hyanwong/GeneticInheritanceGraphLibrary/issues/2
    """
    # time | flags
    node_data = [
        (6, 0),
        (5, 0),
        (4, 0),
        (4, 0),
        (2, 0),
        (2, 0),
        (2, 0),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
    ]
    tables = gigl.Tables()
    tables.nodes = make_nodes_table(node_data, tables)
    tables.iedges.add_rows(
        [
            iedge(0, 200, 0, 200, c=1, p=0),
            iedge(0, 200, 0, 200, c=2, p=1),
            iedge(0, 200, 0, 200, c=3, p=1),
            iedge(0, 50, 0, 50, c=4, p=2),
            iedge(50, 100, 150, 200, c=4, p=2),
            iedge(0, 200, 0, 200, c=5, p=3),
            iedge(200, 300, 100, 200, c=5, p=3),
            iedge(0, 20, 0, 20, c=6, p=0),
            iedge(20, 120, 120, 20, c=6, p=0),
            iedge(120, 200, 120, 200, c=6, p=0),
            iedge(0, 100, 0, 100, c=7, p=4),
            iedge(0, 100, 0, 100, c=8, p=4),
            iedge(0, 300, 0, 300, c=9, p=5),
            iedge(0, 300, 0, 300, c=10, p=5),
            iedge(0, 200, 0, 200, c=11, p=6),
            iedge(0, 200, 0, 200, c=12, p=6),
        ]
    )
    return gigl.Graph(tables)


@pytest.fixture(scope="session")
def all_sv_types_re_gig():
    # time | flags
    node_data = [
        (6, 0),
        (5, 0),
        (4, 0),
        (4, 0),
        (3, 0),
        (3, 0),
        (3, 0),
        (2, 0),
        (1, 0),  # new "RE" node
        (1, 0),  # new "RE" node
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
        (0, gigl.NODE_IS_SAMPLE),
    ]
    tables = gigl.Tables()
    tables.nodes = make_nodes_table(node_data, tables)
    tables.iedges.add_rows(
        [
            iedge(0, 200, 0, 200, c=1, p=0),
            iedge(0, 200, 0, 200, c=2, p=1),
            iedge(0, 200, 0, 200, c=3, p=1),
            # deletion
            iedge(0, 50, 0, 50, c=4, p=2),
            iedge(50, 100, 150, 200, c=4, p=2),
            # duplication
            iedge(0, 200, 0, 200, c=5, p=3),
            iedge(200, 300, 100, 200, c=5, p=3),
            # inversion
            iedge(0, 20, 0, 20, c=6, p=0),
            iedge(20, 120, 120, 20, c=6, p=0),
            iedge(120, 200, 120, 200, c=6, p=0),

            # Extra coalescent node for duplication
            iedge(0, 300, 0, 300, c=7, p=5),

            # recombination combines two SVs
            iedge(0, 70, 0, 70, c=8, p=4),
            iedge(70, 200, 170, 300, c=8, p=7),

            # recombination reinstates original coords
            iedge(0, 150, 0, 150, c=9, p=7),
            iedge(150, 200, 150, 200, c=9, p=6),

            iedge(0, 100, 0, 100, c=10, p=4),  # unrecombined
            iedge(0, 200, 0, 200, c=11, p=8),
            iedge(0, 300, 0, 300, c=12, p=5),  # unrecombined
            iedge(0, 200, 0, 200, c=13, p=9),
            iedge(0, 200, 0, 200, c=14, p=6),  # unrecombined
        ]
    )
    tables.sort()
    return gigl.Graph(tables)

@hyanwong
Copy link
Owner

After discussion with @duncanMR , we decided that it would also be nice for viz purposes to have a GIG with only one recombination node (e.g. the left hand one in the example above). So I think we could define both a all_sv_types_1re_gig and (as above) a all_sv_types_2re_gig. I'll put these into the conftest file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants