[SPARK-54901][Python] Selective column conversion for scalar pandas UDFs #53674

fangchenli · 2026-01-05T05:21:25Z

What changes were proposed in this pull request?

Only convert Arrow columns that are actually used by the scalar pandas UDF(s).

Why are the changes needed?

When executing a scalar Pandas UDF, PySpark currently converts all Arrow columns to Pandas Series, even if the UDF only uses a subset of columns. This is wasteful when working with wide DataFrames, where the UDF needs only a few columns.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests included.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.5

github-actions · 2026-01-05T05:21:34Z

JIRA Issue Information

=== Improvement SPARK-54901 ===
Summary: Selective column conversion for scalar Pandas UDFs
Assignee: None
Status: Open
Affected: ["4.2.0"]

This comment was automatically generated by GitHub Actions

python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py

fangchenli · 2026-01-05T22:37:02Z

Closing since this optimization is not necessary.

[SPARK-54901] Selective column conversion for scalar Pandas UDFs

a74381d

github-actions bot added SQL CORE PYTHON labels Jan 5, 2026

cleanup docstring and comments, simplify flag

c5b4f45

fangchenli marked this pull request as ready for review January 5, 2026 05:28

fangchenli changed the title ~~[SPARK-54901] Selective column conversion for scalar pandas UDFs~~ [SPARK-54901][Python] Selective column conversion for scalar pandas UDFs Jan 5, 2026

zhengruifeng reviewed Jan 5, 2026

View reviewed changes

python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py Show resolved Hide resolved

fangchenli marked this pull request as draft January 5, 2026 22:32

fangchenli closed this Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54901][Python] Selective column conversion for scalar pandas UDFs #53674

[SPARK-54901][Python] Selective column conversion for scalar pandas UDFs #53674

Uh oh!

fangchenli commented Jan 5, 2026

Uh oh!

github-actions bot commented Jan 5, 2026

Uh oh!

Uh oh!

fangchenli commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54901][Python] Selective column conversion for scalar pandas UDFs #53674

[SPARK-54901][Python] Selective column conversion for scalar pandas UDFs #53674

Uh oh!

Conversation

fangchenli commented Jan 5, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 5, 2026

JIRA Issue Information

Uh oh!

Uh oh!

fangchenli commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants