Skip to content

Random Forests are slowed down by AbstractDataDistribution#getEntropy() #48

@Zero3

Description

@Zero3

I'm testing your new RandomForestFactory and am getting some pretty good results! However, the algorithm is slower than expected.

My code trains hundreds of Random Forest classifiers on a small test dataset. I profiled it, and noticed that ~45% of the time is spent in AbstractDataDistribution#getEntropy(). I suspect that this is not supposed to happen, but if I'm wrong, and this is indeed the natural center of computation, please feel free to close this issue.

I don't know what the underlying performance bottleneck is, but I suspect that the call to MathUtil.log2(double) may be the one.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions