Skip to content

[FEA] Ability to generate feature index sets for each data sample on RF trees #3537

Open
@teju85

Description

@teju85

Is your feature request related to a problem? Please describe.
For each bootstrap data sample that was used to build a tree in the RF, we need to generate a list of feature indices that were used to arrive at the leaf node for that sample. Such index sets could further be used for higher-order feature interaction analysis using algorithms like RIT.

Describe the solution you'd like
IOW, the pseudocode for this is as follows:

index_sets = []
predictions = []
for each tree:
  for each bootstrap sample for that tree:
    traverse the tree from root to leaf, while noting all unique feature indices that were used for comparison
    also note the final prediction from the leaf node
    append the feature indices thus obtained to the index_sets array
    append the prediction to the predictions array
return index_sets, predictions

To simplify the GPU implementation we could also assume a binary array initialized to zeros and do a bitwise OR at the corresponding location of that feature index.

We could potentially also generate such a binary array during the training process itself, since we already have all the relevant information to compute such a thing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions