Allow creating SparkContext in executors when doing so in order to score Spark models via mlflow.pyfunc.spark_udf #3355

smurching · 2020-08-28T17:49:39Z

Signed-off-by: Sid Murching sid.murching@databricks.com

What changes are proposed in this pull request?

Flips flag (introduced in apache/spark#28986) to allow creating SparkContext on the executors.

We depend on this behavior (which is being disabled, allowable via a flag, in Spark 3.1) when scoring Spark models via MLflow's mlflow.pyfunc.spark_udf API. In particular, when scoring a Spark model via spark_udf, the underlying pandas_udf we define for scoring constructs a SparkContext in order to create a Spark DataFrame out of the passed-in pandas DataFrame, and then scores the Spark ML model on the dataframe. The pandas_udf runs on the executors, hence we need to be able to create a SparkContext on the executors.

How is this patch tested?

Manual testing against Spark 3.1
(Details)

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

…Spark models using Spark UDF Signed-off-by: Sid Murching <sid.murching@databricks.com>

Signed-off-by: Sid Murching <sid.murching@databricks.com>

…ore Spark models via mlflow.pyfunc.spark_udf (#3355)

smurching added 3 commits August 28, 2020 10:49

Allow creating SparkContext in executors when doing so while scoring …

88b5b86

…Spark models using Spark UDF Signed-off-by: Sid Murching <sid.murching@databricks.com>

Update

021d3f2

Signed-off-by: Sid Murching <sid.murching@databricks.com>

Apply black

07f6180

Signed-off-by: Sid Murching <sid.murching@databricks.com>

andychow-db approved these changes Aug 28, 2020

View reviewed changes

smurching merged commit c0d7f6b into mlflow:master Aug 28, 2020

smurching added a commit that referenced this pull request Aug 28, 2020

Allow creating SparkContext in executors when doing so in order to sc…

f1b6f69

…ore Spark models via mlflow.pyfunc.spark_udf (#3355)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow creating SparkContext in executors when doing so in order to score Spark models via mlflow.pyfunc.spark_udf #3355

Allow creating SparkContext in executors when doing so in order to score Spark models via mlflow.pyfunc.spark_udf #3355

Uh oh!

smurching commented Aug 28, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Allow creating SparkContext in executors when doing so in order to score Spark models via mlflow.pyfunc.spark_udf #3355

Allow creating SparkContext in executors when doing so in order to score Spark models via mlflow.pyfunc.spark_udf #3355

Uh oh!

Conversation

smurching commented Aug 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smurching commented Aug 28, 2020 •

edited

Loading