I'm testing your new RandomForestFactory and am getting some pretty good results! However, the algorithm is slower than expected.
My code trains hundreds of Random Forest classifiers on a small test dataset. I profiled it, and noticed that ~45% of the time is spent in AbstractDataDistribution#getEntropy(). I suspect that this is not supposed to happen, but if I'm wrong, and this is indeed the natural center of computation, please feel free to close this issue.
I don't know what the underlying performance bottleneck is, but I suspect that the call to MathUtil.log2(double) may be the one.