Skip to content

Commit a72cf54

Browse files
sethahuzadude
authored andcommitted
[SPARK-18060][ML] Avoid unnecessary computation for MLOR
## What changes were proposed in this pull request? Before this patch, the gradient updates for multinomial logistic regression were computed by an outer loop over the number of classes and an inner loop over the number of features. Inside the inner loop, we standardized the feature value (`value / featuresStd(index)`), which means we performed the computation `numFeatures * numClasses` times. We only need to perform that computation `numFeatures` times, however. If we re-order the inner and outer loop, we can avoid this, but then we lose sequential memory access. In this patch, we instead lay out the coefficients in column major order while we train, so that we can avoid the extra computation and retain sequential memory access. We convert back to row-major order when we create the model. ## How was this patch tested? This is an implementation detail only, so the original behavior should be maintained. All tests pass. I ran some performance tests to verify speedups. The results are below, and show significant speedups. ## Performance Tests **Setup** 3 node bare-metal cluster 120 cores total 384 gb RAM total **Results** NOTE: The `currentMasterTime` and `thisPatchTime` are times in seconds for a single iteration of L-BFGS or OWL-QN. | | numPoints | numFeatures | numClasses | regParam | elasticNetParam | currentMasterTime (sec) | thisPatchTime (sec) | pctSpeedup | |----|-------------|---------------|--------------|------------|-------------------|---------------------------|-----------------------|--------------| | 0 | 1e+07 | 100 | 500 | 0.5 | 0 | 90 | 18 | 80 | | 1 | 1e+08 | 100 | 50 | 0.5 | 0 | 90 | 19 | 78 | | 2 | 1e+08 | 100 | 50 | 0.05 | 1 | 72 | 19 | 73 | | 3 | 1e+06 | 100 | 5000 | 0.5 | 0 | 93 | 53 | 43 | | 4 | 1e+07 | 100 | 5000 | 0.5 | 0 | 900 | 390 | 56 | | 5 | 1e+08 | 100 | 500 | 0.5 | 0 | 840 | 174 | 79 | | 6 | 1e+08 | 100 | 200 | 0.5 | 0 | 360 | 72 | 80 | | 7 | 1e+08 | 1000 | 5 | 0.5 | 0 | 9 | 3 | 66 | Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15593 from sethah/MLOR_PERF_COL_MAJOR_COEF.
1 parent 0d02074 commit a72cf54

File tree

1 file changed

+74
-51
lines changed

1 file changed

+74
-51
lines changed

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

Lines changed: 74 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -438,18 +438,14 @@ class LogisticRegression @Since("1.2.0") (
438438
val standardizationParam = $(standardization)
439439
def regParamL1Fun = (index: Int) => {
440440
// Remove the L1 penalization on the intercept
441-
val isIntercept = $(fitIntercept) && ((index + 1) % numFeaturesPlusIntercept == 0)
441+
val isIntercept = $(fitIntercept) && index >= numFeatures * numCoefficientSets
442442
if (isIntercept) {
443443
0.0
444444
} else {
445445
if (standardizationParam) {
446446
regParamL1
447447
} else {
448-
val featureIndex = if ($(fitIntercept)) {
449-
index % numFeaturesPlusIntercept
450-
} else {
451-
index % numFeatures
452-
}
448+
val featureIndex = index / numCoefficientSets
453449
// If `standardization` is false, we still standardize the data
454450
// to improve the rate of convergence; as a result, we have to
455451
// perform this reverse standardization by penalizing each component
@@ -466,6 +462,15 @@ class LogisticRegression @Since("1.2.0") (
466462
new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
467463
}
468464

465+
/*
466+
The coefficients are laid out in column major order during training. e.g. for
467+
`numClasses = 3` and `numFeatures = 2` and `fitIntercept = true` the layout is:
468+
469+
Array(beta_11, beta_21, beta_31, beta_12, beta_22, beta_32, intercept_1, intercept_2,
470+
intercept_3)
471+
472+
where beta_jk corresponds to the coefficient for class `j` and feature `k`.
473+
*/
469474
val initialCoefficientsWithIntercept =
470475
Vectors.zeros(numCoefficientSets * numFeaturesPlusIntercept)
471476

@@ -489,13 +494,14 @@ class LogisticRegression @Since("1.2.0") (
489494
val initialCoefWithInterceptArray = initialCoefficientsWithIntercept.toArray
490495
val providedCoef = optInitialModel.get.coefficientMatrix
491496
providedCoef.foreachActive { (row, col, value) =>
492-
val flatIndex = row * numFeaturesPlusIntercept + col
497+
// convert matrix to column major for training
498+
val flatIndex = col * numCoefficientSets + row
493499
// We need to scale the coefficients since they will be trained in the scaled space
494500
initialCoefWithInterceptArray(flatIndex) = value * featuresStd(col)
495501
}
496502
if ($(fitIntercept)) {
497503
optInitialModel.get.interceptVector.foreachActive { (index, value) =>
498-
val coefIndex = (index + 1) * numFeaturesPlusIntercept - 1
504+
val coefIndex = numCoefficientSets * numFeatures + index
499505
initialCoefWithInterceptArray(coefIndex) = value
500506
}
501507
}
@@ -526,7 +532,7 @@ class LogisticRegression @Since("1.2.0") (
526532
val rawIntercepts = histogram.map(c => math.log(c + 1)) // add 1 for smoothing
527533
val rawMean = rawIntercepts.sum / rawIntercepts.length
528534
rawIntercepts.indices.foreach { i =>
529-
initialCoefficientsWithIntercept.toArray(i * numFeaturesPlusIntercept + numFeatures) =
535+
initialCoefficientsWithIntercept.toArray(numClasses * numFeatures + i) =
530536
rawIntercepts(i) - rawMean
531537
}
532538
} else if ($(fitIntercept)) {
@@ -572,16 +578,20 @@ class LogisticRegression @Since("1.2.0") (
572578
/*
573579
The coefficients are trained in the scaled space; we're converting them back to
574580
the original space.
581+
582+
Additionally, since the coefficients were laid out in column major order during training
583+
to avoid extra computation, we convert them back to row major before passing them to the
584+
model.
585+
575586
Note that the intercept in scaled space and original space is the same;
576587
as a result, no scaling is needed.
577588
*/
578589
val rawCoefficients = state.x.toArray.clone()
579590
val coefficientArray = Array.tabulate(numCoefficientSets * numFeatures) { i =>
580-
// flatIndex will loop though rawCoefficients, and skip the intercept terms.
581-
val flatIndex = if ($(fitIntercept)) i + i / numFeatures else i
591+
val colMajorIndex = (i % numFeatures) * numCoefficientSets + i / numFeatures
582592
val featureIndex = i % numFeatures
583593
if (featuresStd(featureIndex) != 0.0) {
584-
rawCoefficients(flatIndex) / featuresStd(featureIndex)
594+
rawCoefficients(colMajorIndex) / featuresStd(featureIndex)
585595
} else {
586596
0.0
587597
}
@@ -618,7 +628,7 @@ class LogisticRegression @Since("1.2.0") (
618628

619629
val interceptsArray: Array[Double] = if ($(fitIntercept)) {
620630
Array.tabulate(numCoefficientSets) { i =>
621-
val coefIndex = (i + 1) * numFeaturesPlusIntercept - 1
631+
val coefIndex = numFeatures * numCoefficientSets + i
622632
rawCoefficients(coefIndex)
623633
}
624634
} else {
@@ -697,6 +707,7 @@ class LogisticRegressionModel private[spark] (
697707
/**
698708
* A vector of model coefficients for "binomial" logistic regression. If this model was trained
699709
* using the "multinomial" family then an exception is thrown.
710+
*
700711
* @return Vector
701712
*/
702713
@Since("2.0.0")
@@ -720,6 +731,7 @@ class LogisticRegressionModel private[spark] (
720731
/**
721732
* The model intercept for "binomial" logistic regression. If this model was fit with the
722733
* "multinomial" family then an exception is thrown.
734+
*
723735
* @return Double
724736
*/
725737
@Since("1.3.0")
@@ -1389,6 +1401,12 @@ class BinaryLogisticRegressionSummary private[classification] (
13891401
* $$
13901402
* </blockquote></p>
13911403
*
1404+
* @note In order to avoid unnecessary computation during calculation of the gradient updates
1405+
* we lay out the coefficients in column major order during training. This allows us to
1406+
* perform feature standardization once, while still retaining sequential memory access
1407+
* for speed. We convert back to row major order when we create the model,
1408+
* since this form is optimal for the matrix operations used for prediction.
1409+
*
13921410
* @param bcCoefficients The broadcast coefficients corresponding to the features.
13931411
* @param bcFeaturesStd The broadcast standard deviation values of the features.
13941412
* @param numClasses the number of possible outcomes for k classes classification problem in
@@ -1486,57 +1504,65 @@ private class LogisticAggregator(
14861504
var marginOfLabel = 0.0
14871505
var maxMargin = Double.NegativeInfinity
14881506

1489-
val margins = Array.tabulate(numClasses) { i =>
1490-
var margin = 0.0
1491-
features.foreachActive { (index, value) =>
1492-
if (localFeaturesStd(index) != 0.0 && value != 0.0) {
1493-
margin += localCoefficients(i * numFeaturesPlusIntercept + index) *
1494-
value / localFeaturesStd(index)
1495-
}
1507+
val margins = new Array[Double](numClasses)
1508+
features.foreachActive { (index, value) =>
1509+
val stdValue = value / localFeaturesStd(index)
1510+
var j = 0
1511+
while (j < numClasses) {
1512+
margins(j) += localCoefficients(index * numClasses + j) * stdValue
1513+
j += 1
14961514
}
1497-
1515+
}
1516+
var i = 0
1517+
while (i < numClasses) {
14981518
if (fitIntercept) {
1499-
margin += localCoefficients(i * numFeaturesPlusIntercept + numFeatures)
1519+
margins(i) += localCoefficients(numClasses * numFeatures + i)
15001520
}
1501-
if (i == label.toInt) marginOfLabel = margin
1502-
if (margin > maxMargin) {
1503-
maxMargin = margin
1521+
if (i == label.toInt) marginOfLabel = margins(i)
1522+
if (margins(i) > maxMargin) {
1523+
maxMargin = margins(i)
15041524
}
1505-
margin
1525+
i += 1
15061526
}
15071527

15081528
/**
15091529
* When maxMargin > 0, the original formula could cause overflow.
15101530
* We address this by subtracting maxMargin from all the margins, so it's guaranteed
15111531
* that all of the new margins will be smaller than zero to prevent arithmetic overflow.
15121532
*/
1533+
val multipliers = new Array[Double](numClasses)
15131534
val sum = {
15141535
var temp = 0.0
1515-
if (maxMargin > 0) {
1516-
for (i <- 0 until numClasses) {
1517-
margins(i) -= maxMargin
1518-
temp += math.exp(margins(i))
1519-
}
1520-
} else {
1521-
for (i <- 0 until numClasses) {
1522-
temp += math.exp(margins(i))
1523-
}
1536+
var i = 0
1537+
while (i < numClasses) {
1538+
if (maxMargin > 0) margins(i) -= maxMargin
1539+
val exp = math.exp(margins(i))
1540+
temp += exp
1541+
multipliers(i) = exp
1542+
i += 1
15241543
}
15251544
temp
15261545
}
15271546

1528-
for (i <- 0 until numClasses) {
1529-
val multiplier = math.exp(margins(i)) / sum - {
1530-
if (label == i) 1.0 else 0.0
1531-
}
1532-
features.foreachActive { (index, value) =>
1533-
if (localFeaturesStd(index) != 0.0 && value != 0.0) {
1534-
localGradientArray(i * numFeaturesPlusIntercept + index) +=
1535-
weight * multiplier * value / localFeaturesStd(index)
1547+
margins.indices.foreach { i =>
1548+
multipliers(i) = multipliers(i) / sum - (if (label == i) 1.0 else 0.0)
1549+
}
1550+
features.foreachActive { (index, value) =>
1551+
if (localFeaturesStd(index) != 0.0 && value != 0.0) {
1552+
val stdValue = value / localFeaturesStd(index)
1553+
var j = 0
1554+
while (j < numClasses) {
1555+
localGradientArray(index * numClasses + j) +=
1556+
weight * multipliers(j) * stdValue
1557+
j += 1
15361558
}
15371559
}
1538-
if (fitIntercept) {
1539-
localGradientArray(i * numFeaturesPlusIntercept + numFeatures) += weight * multiplier
1560+
}
1561+
if (fitIntercept) {
1562+
var i = 0
1563+
while (i < numClasses) {
1564+
localGradientArray(numFeatures * numClasses + i) += weight * multipliers(i)
1565+
i += 1
15401566
}
15411567
}
15421568

@@ -1637,6 +1663,7 @@ private class LogisticCostFun(
16371663
val bcCoeffs = instances.context.broadcast(coeffs)
16381664
val featuresStd = bcFeaturesStd.value
16391665
val numFeatures = featuresStd.length
1666+
val numCoefficientSets = if (multinomial) numClasses else 1
16401667

16411668
val logisticAggregator = {
16421669
val seqOp = (c: LogisticAggregator, instance: Instance) => c.add(instance)
@@ -1656,7 +1683,7 @@ private class LogisticCostFun(
16561683
var sum = 0.0
16571684
coeffs.foreachActive { case (index, value) =>
16581685
// We do not apply regularization to the intercepts
1659-
val isIntercept = fitIntercept && ((index + 1) % (numFeatures + 1) == 0)
1686+
val isIntercept = fitIntercept && index >= numCoefficientSets * numFeatures
16601687
if (!isIntercept) {
16611688
// The following code will compute the loss of the regularization; also
16621689
// the gradient of the regularization, and add back to totalGradientArray.
@@ -1665,11 +1692,7 @@ private class LogisticCostFun(
16651692
totalGradientArray(index) += regParamL2 * value
16661693
value * value
16671694
} else {
1668-
val featureIndex = if (fitIntercept) {
1669-
index % (numFeatures + 1)
1670-
} else {
1671-
index % numFeatures
1672-
}
1695+
val featureIndex = index / numCoefficientSets
16731696
if (featuresStd(featureIndex) != 0.0) {
16741697
// If `standardization` is false, we still standardize the data
16751698
// to improve the rate of convergence; as a result, we have to

0 commit comments

Comments
 (0)