Description
With @awohns , we find that we need a function like ts.delete sites
but which only deletes the mutations at the site, and leaves the site table untouched. This is basically just a slimmed down version of delete_sites, that looks something like the code below.
For the moment we will include the function below in tsdate, but it would be nice to get this functionality into tskit, I think. Perhaps it could be done with an extra parameter to the existing delete_sites
function, or perhaps for didactic reasons we should deprecate delete_sites
and introduce delete_mutations(site_ids=XXX, delete_site=True)
which does the same as the current delete_sites function? If the latter, I guess in the longer term we could add functionality to delete a specific list of mutations, which need not be all of them at a site, but that's a bit complex and can be kicked down the road.
def delete_site_mutations(self, site_ids, record_provenance=True):
"""
Remove the mutations at the specified sites entirely from the mutations table in
this collection.
:param list[int] site_ids: A list of site IDs specifying the sites whose
mutations will be removed.
:param bool record_provenance: If ``True``, add details of this operation
to the provenance table in this TableCollection. (Default: ``True``).
"""
keep_sites = np.ones(len(self.sites), dtype=bool)
site_ids = util.safe_np_int_cast(site_ids, np.int32)
if np.any(site_ids < 0) or np.any(site_ids >= len(self.sites)):
raise ValueError("Site ID out of bounds")
keep_sites[site_ids] = 0
keep_mutations = keep_sites[self.mutations.site]
new_ds, new_ds_offset = keep_with_offset(
keep_mutations,
self.mutations.derived_state,
self.mutations.derived_state_offset,
)
new_md, new_md_offset = keep_with_offset(
keep_mutations, self.mutations.metadata, self.mutations.metadata_offset
)
# Mutation numbers will change, so the parent references need altering
mutation_map = np.cumsum(keep_mutations, dtype=self.mutations.parent.dtype) - 1
# Map parent == -1 to -1, and check this has worked (assumes tskit.NULL == -1)
mutation_map = np.append(mutation_map, -1).astype(self.mutations.parent.dtype)
assert mutation_map[tskit.NULL] == tskit.NULL
self.mutations.set_columns(
site=self.mutations.site[keep_mutations],
node=self.mutations.node[keep_mutations],
time=self.mutations.time[keep_mutations],
derived_state=new_ds,
derived_state_offset=new_ds_offset,
parent=mutation_map[self.mutations.parent[keep_mutations]],
metadata=new_md,
metadata_offset=new_md_offset,
)
if record_provenance:
# TODO replace with a version of https://github.com/tskit-dev/tskit/pull/243
parameters = {"command": "delete_site_mutations", "TODO": "add parameters"}
self.provenances.add_row(
record=json.dumps(provenance.get_provenance_dict(parameters))
)