This repository was archived by the owner on Dec 4, 2019. It is now read-only.
This repository was archived by the owner on Dec 4, 2019. It is now read-only.
spark-sklearn on windows- not working on local #110
Closed
Description
am trying to execute this code from the spark-sklearn API documentation. I'm running on Windows 7 and on the latest spark-sklearn version. I'm executing inside the pyspark shell. Running spark version 2.3.2 (tried 2.4.0, same error)
from sklearn.linear_model import LinearRegression
from sklearn.cluster import KMeans
from pyspark.ml.linalg import Vectors, Matrices, MatrixUDT
from pyspark.sql.functions import udf
from pyspark.sql import SparkSession
from spark_sklearn.util import createLocalSparkSession
from spark_sklearn.keyed_models import KeyedEstimator
spark = createLocalSparkSession()
df = spark.createDataFrame([(user,
Vectors.dense([i, i ** 2, i ** 3]),
0.0 + user + i + 2 * i ** 2 + 3 * i ** 3)
for user in range(3) for i in range(5)])
df = df.toDF("key", "features", "y")
df.where("5 < y and y < 10").sort("key", "y").show()
km = KeyedEstimator(sklearnEstimator=LinearRegression(), yCol="y").fit(df)
def printFloat(x):
rounded = round(x, 2)
return "{:.2f}".format(0 if rounded == 0 else rounded)
def printModel(model):
coef = "[" + ", ".join(map(printFloat, model.coef_)) + "]"
intercept = printFloat(model.intercept_)
return "intercept: {} coef: {}".format(intercept, coef)
km.keyedModels.columns
printedModels = udf(printModel)("estimator").alias("linear fit")
km.keyedModels.select("key", printedModels).sort("key").show(truncate=False)`
On running this I get the following error on the show at km.keyedModels.select("key", printedModels).sort("key").show(truncate=False)
I cannot seem to find any solution online. Since the show before this is working, clearly my spark seems to be working.
Metadata
Metadata
Assignees
Labels
No labels