Skip to content

Commit 615e91c

Browse files
author
Ram Sriharsha
committed
cleanup
1 parent 204c4e3 commit 615e91c

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

examples/src/main/python/ml/cross_validator.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
# Configure an ML pipeline, which consists of tree stages: tokenizer, hashingTF, and lr.
4949
tokenizer = Tokenizer(inputCol="text", outputCol="words")
5050
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
51-
lr = LogisticRegression(maxIter=10, regParam=0.001)
51+
lr = LogisticRegression(maxIter=10)
5252
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
5353

5454
# We now treat the Pipeline as an Estimator, wrapping it in a CrossValidator instance.
@@ -65,7 +65,7 @@
6565
crossval = CrossValidator(estimator=pipeline,
6666
estimatorParamMaps=paramGrid,
6767
evaluator=BinaryClassificationEvaluator(),
68-
numFolds=2)
68+
numFolds=2) # use 3+ folds in practice
6969

7070
# Run cross-validation, and choose the best set of parameters.
7171
cvModel = crossval.fit(training)

examples/src/main/python/ml/simple_params_example.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,8 @@
4141

4242
# prepare training data.
4343
# We create an RDD of LabeledPoints and convert them into a DataFrame.
44-
# Spark DataFrames can automatically infer the schema from named tuples
45-
# and LabeledPoint implements __reduce__ to behave like a named tuple.
44+
# A LabeledPoint is an Object with two fields named label and features
45+
# and Spark SQL identifies these fields and creates the schema appropriately.
4646
training = sc.parallelize([
4747
LabeledPoint(1.0, DenseVector([0.0, 1.1, 0.1])),
4848
LabeledPoint(0.0, DenseVector([2.0, 1.0, -1.0])),

0 commit comments

Comments
 (0)