Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

In #12127 , we moved the ExtractPythonUDFs rule to the physical phase, while there was another option: do ExtractPythonUDFs at the end of optimizer.

Currently we hit 2 issues when exacting python UDFs at physical phase:

  1. it happens after data source v2 strategy, so data source v2 strategy needs to deal with python udfs carefully and adds project to produce unsafe row for python udf. See [SPARK-25213][PYTHON] Add project to v2 scans before python filters. #22206
  2. it happens after file source strategy, so we may keep Python UDF as data filter in FileSourceScanExec and fail the planner when try to extract it later. See [SPARK-24721][SQL] Extract Python UDFs at the end of optimizer #22104

This PR proposes to move ExtractPythonUDFs to the end of optimizer.

How was this patch tested?

TODO

@cloud-fan
Copy link
Contributor Author

cc @icexelloss @HyukjinKwon @rdblue

@icexelloss feel free to take this over and verify if it can pass the tests you added in #22104 , thanks!

@SparkQA
Copy link

SparkQA commented Aug 27, 2018

Test build #95292 has finished for PR 22244 at commit f0e547c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class ArrowEvalPython(udfs: Seq[PythonUDF], output: Seq[Attribute], child: LogicalPlan)
  • case class BatchEvalPython(udfs: Seq[PythonUDF], output: Seq[Attribute], child: LogicalPlan)

@icexelloss
Copy link
Contributor

@cloud-fan Thanks! I will take a look later today and incorporate this with my patch.

@rdblue
Copy link
Contributor

rdblue commented Aug 27, 2018

@cloud-fan, I like this solution better than adding a special case in the v2 conversion to physical plan. This explains why the Python exec nodes weren't already in the tree! I'd much rather commit this or something like it than go with my work-around solution, which is fairly brittle.

For now, I'm going to assume that this is going to be the fix. Please let me know if you want me to do further work to get my patch ready. We could use it if this work ends up being more complex than we want to wait on for the release.

@icexelloss, also feel free to take the test case from #22206, and thank you for working on this!

@cloud-fan
Copy link
Contributor Author

closing in favor of #22104

@cloud-fan cloud-fan closed this Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants