Description
Is your feature request related to a problem? Please describe.
For each bootstrap data sample that was used to build a tree in the RF, we need to generate a list of feature indices that were used to arrive at the leaf node for that sample. Such index sets could further be used for higher-order feature interaction analysis using algorithms like RIT.
Describe the solution you'd like
IOW, the pseudocode for this is as follows:
index_sets = []
predictions = []
for each tree:
for each bootstrap sample for that tree:
traverse the tree from root to leaf, while noting all unique feature indices that were used for comparison
also note the final prediction from the leaf node
append the feature indices thus obtained to the index_sets array
append the prediction to the predictions array
return index_sets, predictions
To simplify the GPU implementation we could also assume a binary array initialized to zeros and do a bitwise OR at the corresponding location of that feature index.
We could potentially also generate such a binary array during the training process itself, since we already have all the relevant information to compute such a thing.