Skip to content

Commit

Permalink
添加说明
Browse files Browse the repository at this point in the history
  • Loading branch information
endymecy authored Feb 11, 2018
1 parent dbb95ec commit 955bf71
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions 分类和回归/组合树/随机森林/random-forests.md
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,8 @@ private[tree] def findSplitsForContinuousFeature(
if (possibleSplits <= numSplits) {
valueCounts.map(_._1)
} else {
// 切分点之间的步长
       // 等频切分
       // 切分点之间的步长
val stride: Double = featureSamples.length.toDouble / (numSplits + 1)
val splitsBuilder = Array.newBuilder[Double]
var index = 1
Expand Down Expand Up @@ -449,6 +450,7 @@ private[tree] def findSplitsForContinuousFeature(
splits
}
```
&emsp;&emsp; 在if判断里每步前进`stride`个样本,累加在`targetCount`中。`while`循环逐次把每个特征值的个数加到`currentCount`里,计算前一次`previousCount`和这次`currentCount``targetCount`的距离,有3种情况,一种是`pre``cur`都在`target`左边,肯定是`cur`小,继续循环,进入第二种情况;第二种一左一右,如果`pre`小,肯定是`pre`是最好的分割点,如果`cur`还是小,继续循环步进,进入第三种情况;第三种就是都在右边,显然是`pre`小。因此`if`的判断条件`pre<cur`,只要满足肯定就是`split`。整体下来的效果就能找到离`target`最近的一个特征值。

#### 5.1.2 迭代构建随机森林

Expand Down Expand Up @@ -793,4 +795,4 @@ private def predictByVoting(features: Vector): Double = {

【2】[Spark 随机森林算法原理、源码分析及案例实战](https://www.ibm.com/developerworks/cn/opensource/os-cn-spark-random-forest/)

【3】[Scalable Distributed Decision Trees in Spark MLlib](https://spark-summit.org/wp-content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf)
【3】[Scalable Distributed Decision Trees in Spark MLlib](https://spark-summit.org/wp-content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf)

0 comments on commit 955bf71

Please sign in to comment.