[FEA] Ability to generate feature index sets for each data sample on RF trees

**Is your feature request related to a problem? Please describe.**
For each bootstrap data sample that was used to build a tree in the RF, we need to generate a list of feature indices that were used to arrive at the leaf node for that sample. Such index sets could further be used for higher-order feature interaction analysis using algorithms like [RIT](http://www.statslab.cam.ac.uk/~rds37/papers/Shah%20Meinshausen%202013%20Random%20Intersection%20Trees).

**Describe the solution you'd like**
IOW, the pseudocode for this is as follows:
```
index_sets = []
predictions = []
for each tree:
  for each bootstrap sample for that tree:
    traverse the tree from root to leaf, while noting all unique feature indices that were used for comparison
    also note the final prediction from the leaf node
    append the feature indices thus obtained to the index_sets array
    append the prediction to the predictions array
return index_sets, predictions
```
To simplify the GPU implementation we could also assume a binary array initialized to zeros and do a bitwise OR at the corresponding location of that feature index.

We could potentially also generate such a binary array during the training process itself, since we already have all the relevant information to compute such a thing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Ability to generate feature index sets for each data sample on RF trees #3537

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Ability to generate feature index sets for each data sample on RF trees #3537

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions