[FEA] Isolation Forest Training support in cuML

**Is your feature request related to a problem? Please describe.**
[Isolation Forest (IF)](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html) is a popular unsupervised anomaly detection method used to identify fraud. Ex. Banks and Retail companies use IF to determine zero day threats i.e new patterns in threats which supervised algorithms like XGBoost and GNN are unable to determine because of class imbalance or other issues. 

While cuML supports inferencing on scikit-learn's IF model via ForestInference Library (experimental feature) ([Issue #3838](https://github.com/rapidsai/cuml/issues/3838#issuecomment-2319166695)), it would be great to have IF model training implemented in cuML similar to the implementation of [Isolation Forest in scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html)

**Describe the solution you'd like**
Something like below -

```
from cuml.ensemble import IsolationForest
X = [[-1.1], [0.3], [0.5], [100]]
clf = IsolationForest(random_state=0).fit(X)
clf.predict([[0.1], [0], [90]])
```

**Implementation Details**
The following needs to be implemented and tested in cuML to enable IF-
1. Splitting the decision tree randomly while building the trees via [NodeSplitKernel](https://github.com/rapidsai/cuml/blob/8ed7bda0b21aeb0b8b20d0b785911c3fa8e74cb8/cpp/src/decisiontree/batched-levelalgo/kernels/builder_kernels_impl.cuh#L91)
2. Implementation for calculating path length to detect anomalies similar to scikit-learn implementation [HERE](https://github.com/scikit-learn/scikit-learn/blob/d5082d32de2797f9594c9477f2810c743560a1f1/sklearn/ensemble/_iforest.py#L535)
3. Testing whether [data quantization into bins](https://github.com/rapidsai/cuml/blob/8ed7bda0b21aeb0b8b20d0b785911c3fa8e74cb8/cpp/src/decisiontree/batched-levelalgo/quantiles.cuh#L47) would affect the performance of IsolationForest. 

@vinaydes @dantegd @beckernick @hcho3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Isolation Forest Training support in cuML #6096

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Isolation Forest Training support in cuML #6096

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions