Description
In #2665 I added a draft function that looked like this:
int
tsk_node_table_delete_rows(tsk_node_table_t *self, bool *delete_rows,
tsk_flags_t TSK_UNUSED(options), tsk_id_t *id_map)
{
int ret;
tsk_node_table_t copy;
tsk_node_t node;
tsk_size_t j;
tsk_id_t ret_id;
ret = tsk_node_table_copy(self, ©, 0);
if (ret != 0) {
goto out;
}
ret = tsk_node_table_clear(self);
if (ret != 0) {
goto out;
}
for (j = 0; j < copy.num_rows; j++) {
if (id_map != NULL) {
id_map[j] = TSK_NULL;
}
if (!delete_rows[j]) {
tsk_node_table_get_row_unsafe(©, (tsk_id_t) j, &node);
ret_id = tsk_node_table_add_row(self, node.flags, node.time, node.population,
node.individual, node.metadata, node.metadata_length);
if (ret_id < 0) {
ret = (int) ret_id;
goto out;
}
if (id_map != NULL) {
id_map[j] = ret_id;
}
}
}
out:
tsk_node_table_free(©);
return ret;
}
The idea is that we can keep/delete a subset of the rows in a table, in place, and optionally return the mapping of new-to-old IDs. I implemented this as delete_rows
here, but it felt a bit unnatural and maybe it would be better to do it as keep_rows
instead? Maybe subset(bool *keep_rows)
? Any better ideas?
I juggled with the idea of providing a list of row indexes instead, but I think the boolean mask is more generally useful, and is easy to set up if you do have a list of rows you want to get rid of/keep.
The idea is to do this for all tables, and for tables that self-reference (e.g. mutation table) to automatically remap those references and to (by default) return an error if you try to delete a row that has references to it.
See also #1034