Skip to content

Method to in-place subset a table #2666

Closed
@jeromekelleher

Description

@jeromekelleher

In #2665 I added a draft function that looked like this:

int
tsk_node_table_delete_rows(tsk_node_table_t *self, bool *delete_rows,
    tsk_flags_t TSK_UNUSED(options), tsk_id_t *id_map)
{
    int ret;
    tsk_node_table_t copy;
    tsk_node_t node;
    tsk_size_t j;
    tsk_id_t ret_id;

    ret = tsk_node_table_copy(self, &copy, 0);
    if (ret != 0) {
        goto out;
    }
    ret = tsk_node_table_clear(self);
    if (ret != 0) {
        goto out;
    }
    for (j = 0; j < copy.num_rows; j++) {
        if (id_map != NULL) {
            id_map[j] = TSK_NULL;
        }
        if (!delete_rows[j]) {
            tsk_node_table_get_row_unsafe(&copy, (tsk_id_t) j, &node);
            ret_id = tsk_node_table_add_row(self, node.flags, node.time, node.population,
                node.individual, node.metadata, node.metadata_length);
            if (ret_id < 0) {
                ret = (int) ret_id;
                goto out;
            }
            if (id_map != NULL) {
                id_map[j] = ret_id;
            }
        }
    }
out:
    tsk_node_table_free(&copy);
    return ret;
}

The idea is that we can keep/delete a subset of the rows in a table, in place, and optionally return the mapping of new-to-old IDs. I implemented this as delete_rows here, but it felt a bit unnatural and maybe it would be better to do it as keep_rows instead? Maybe subset(bool *keep_rows)? Any better ideas?

I juggled with the idea of providing a list of row indexes instead, but I think the boolean mask is more generally useful, and is easy to set up if you do have a list of rows you want to get rid of/keep.

The idea is to do this for all tables, and for tables that self-reference (e.g. mutation table) to automatically remap those references and to (by default) return an error if you try to delete a row that has references to it.

See also #1034

Metadata

Metadata

Assignees

No one assigned

    Labels

    C APIIssue is about the C APIenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions