Skip to content

Commit

Permalink
[SPARK-18608][ML][FOLLOWUP] Fix double caching for PySpark OneVsRest.
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?
apache#19197 fixed double caching for MLlib algorithms, but missed PySpark ```OneVsRest```, this PR fixed it.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#19220 from yanboliang/SPARK-18608.
  • Loading branch information
yanboliang committed Sep 14, 2017
1 parent 66cb72d commit c76153c
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions python/pyspark/ml/classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -1773,8 +1773,7 @@ def _fit(self, dataset):
multiclassLabeled = dataset.select(labelCol, featuresCol)

# persist if underlying dataset is not persistent.
handlePersistence = \
dataset.rdd.getStorageLevel() == StorageLevel(False, False, False, False)
handlePersistence = dataset.storageLevel == StorageLevel(False, False, False, False)
if handlePersistence:
multiclassLabeled.persist(StorageLevel.MEMORY_AND_DISK)

Expand Down Expand Up @@ -1928,8 +1927,7 @@ def _transform(self, dataset):
newDataset = dataset.withColumn(accColName, initUDF(dataset[origCols[0]]))

# persist if underlying dataset is not persistent.
handlePersistence = \
dataset.rdd.getStorageLevel() == StorageLevel(False, False, False, False)
handlePersistence = dataset.storageLevel == StorageLevel(False, False, False, False)
if handlePersistence:
newDataset.persist(StorageLevel.MEMORY_AND_DISK)

Expand Down

0 comments on commit c76153c

Please sign in to comment.