Isolation Forests [Liu+2008] and Robust Random Cut Trees [Guha+2016] are very similar in many ways, as outlined in the supporting overview. Most notably, they are extremes of the same outlier scoring function:
The combination robust cut forest allows you to combine both scores by using an theta other than 0 or 1.
You can install with through pip install crcf
. Alternatively, you can download the repository and run
python3 setup.py install
or pip3 install .
Please note that this package uses features from Python 3.7+
and is not compatible with earlier Python versions.
- complete basic implementation
- provide clear documentation and usage instructions
- ensure interface allows for fitting and scoring on multiple points at the same time
- implement a better saving method than pickling
- use random tests with hypothesis
- implement tree down in cython
- accelerate forests with multi-threading
- incorporate categorical variable support, including categorical rules
- complete the write-up document with a benchmarking of performance
- [Liu+2008]: Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." In 2008 Eighth IEEE International Conference on Data Mining, pp. 413-422. IEEE, 2008.
- [Guha+2016]: Guha, Sudipto, Nina Mishra, Gourav Roy, and Okke Schrijvers. "Robust random cut forest based anomaly detection on streams." In International conference on machine learning, pp. 2712-2721. 2016.