Skip to content
/ crcf Public

Combination Robust Cut Forests: Merging Isolation Forests and Robust Random Cut Forests

License

Notifications You must be signed in to change notification settings

jmbhughes/crcf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Combination Robust Cut Forests

CodeFactor PyPI version codecov

Isolation Forests [Liu+2008] and Robust Random Cut Trees [Guha+2016] are very similar in many ways, as outlined in the supporting overview. Most notably, they are extremes of the same outlier scoring function:

$$\theta \textrm{Depth} + (1 - \theta) \textrm{[Co]Disp}$$

The combination robust cut forest allows you to combine both scores by using an theta other than 0 or 1.

Install

You can install with through pip install crcf. Alternatively, you can download the repository and run python3 setup.py install or pip3 install . Please note that this package uses features from Python 3.7+ and is not compatible with earlier Python versions.

Tasks

  • complete basic implementation
  • provide clear documentation and usage instructions
  • ensure interface allows for fitting and scoring on multiple points at the same time
  • implement a better saving method than pickling
  • use random tests with hypothesis
  • implement tree down in cython
  • accelerate forests with multi-threading
  • incorporate categorical variable support, including categorical rules
  • complete the write-up document with a benchmarking of performance

References