Skip to content

Function to delete all mutations at a site but leave the site in the table #1034

Closed
@hyanwong

Description

@hyanwong

With @awohns , we find that we need a function like ts.delete sites but which only deletes the mutations at the site, and leaves the site table untouched. This is basically just a slimmed down version of delete_sites, that looks something like the code below.

For the moment we will include the function below in tsdate, but it would be nice to get this functionality into tskit, I think. Perhaps it could be done with an extra parameter to the existing delete_sites function, or perhaps for didactic reasons we should deprecate delete_sites and introduce delete_mutations(site_ids=XXX, delete_site=True) which does the same as the current delete_sites function? If the latter, I guess in the longer term we could add functionality to delete a specific list of mutations, which need not be all of them at a site, but that's a bit complex and can be kicked down the road.

    def delete_site_mutations(self, site_ids, record_provenance=True):
        """
        Remove the mutations at the specified sites entirely from the mutations table in
        this collection.

        :param list[int] site_ids: A list of site IDs specifying the sites whose
            mutations will be removed.
        :param bool record_provenance: If ``True``, add details of this operation
            to the provenance table in this TableCollection. (Default: ``True``).
        """
        keep_sites = np.ones(len(self.sites), dtype=bool)
        site_ids = util.safe_np_int_cast(site_ids, np.int32)
        if np.any(site_ids < 0) or np.any(site_ids >= len(self.sites)):
            raise ValueError("Site ID out of bounds")
        keep_sites[site_ids] = 0
        keep_mutations = keep_sites[self.mutations.site]
        new_ds, new_ds_offset = keep_with_offset(
            keep_mutations,
            self.mutations.derived_state,
            self.mutations.derived_state_offset,
        )
        new_md, new_md_offset = keep_with_offset(
            keep_mutations, self.mutations.metadata, self.mutations.metadata_offset
        )
        # Mutation numbers will change, so the parent references need altering
        mutation_map = np.cumsum(keep_mutations, dtype=self.mutations.parent.dtype) - 1
        # Map parent == -1 to -1, and check this has worked (assumes tskit.NULL == -1)
        mutation_map = np.append(mutation_map, -1).astype(self.mutations.parent.dtype)
        assert mutation_map[tskit.NULL] == tskit.NULL
        self.mutations.set_columns(
            site=self.mutations.site[keep_mutations],
            node=self.mutations.node[keep_mutations],
            time=self.mutations.time[keep_mutations],
            derived_state=new_ds,
            derived_state_offset=new_ds_offset,
            parent=mutation_map[self.mutations.parent[keep_mutations]],
            metadata=new_md,
            metadata_offset=new_md_offset,
        )
        if record_provenance:
            # TODO replace with a version of https://github.com/tskit-dev/tskit/pull/243
            parameters = {"command": "delete_site_mutations", "TODO": "add parameters"}
            self.provenances.add_row(
                record=json.dumps(provenance.get_provenance_dict(parameters))
            )

Metadata

Metadata

Assignees

No one assigned

    Labels

    Python APIIssue is about the Python APIenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions