Skip to content

[FEA] Isolation Forest Training support in cuML #6096

Open
@singhmanas1

Description

@singhmanas1

Is your feature request related to a problem? Please describe.
Isolation Forest (IF) is a popular unsupervised anomaly detection method used to identify fraud. Ex. Banks and Retail companies use IF to determine zero day threats i.e new patterns in threats which supervised algorithms like XGBoost and GNN are unable to determine because of class imbalance or other issues.

While cuML supports inferencing on scikit-learn's IF model via ForestInference Library (experimental feature) (Issue #3838), it would be great to have IF model training implemented in cuML similar to the implementation of Isolation Forest in scikit-learn

Describe the solution you'd like
Something like below -

from cuml.ensemble import IsolationForest
X = [[-1.1], [0.3], [0.5], [100]]
clf = IsolationForest(random_state=0).fit(X)
clf.predict([[0.1], [0], [90]])

Implementation Details
The following needs to be implemented and tested in cuML to enable IF-

  1. Splitting the decision tree randomly while building the trees via NodeSplitKernel
  2. Implementation for calculating path length to detect anomalies similar to scikit-learn implementation HERE
  3. Testing whether data quantization into bins would affect the performance of IsolationForest.

@vinaydes @dantegd @beckernick @hcho3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions