[SPARK-54598][PYTHON] Extract logic to read UDFs #53330

Yicong-Huang · 2025-12-05T00:22:52Z

What changes were proposed in this pull request?

This PR refactors the UDF reading logic in read_udfs() to eliminate code duplication. Currently, the logic for reading UDFs (functions and their argument offsets) is duplicated across multiple eval_type branches, with different patterns for single UDF vs. multiple UDFs cases.

Why are the changes needed?

This duplication makes the code harder to maintain and increases the risk of inconsistencies. By centralizing the UDF reading logic at the beginning of read_udfs(), we can:

Reduce code duplication
Ensure consistent UDF reading behavior across all eval types
Make it easier to add new eval types in the future

Does this PR introduce any user-facing change?

No, this is an internal refactoring that maintains backward compatibility. The API behavior remains the same from the user's perspective.

How was this patch tested?

Existing Tests

Was this patch authored or co-authored using generative AI tooling?

No

gaogaotiantian · 2025-12-05T00:34:52Z

Do you need another function for it? It's basically just two lines:

num_udfs = read_int(infile)
udfs = [
    read_single_udf(
        pickleSer, infile, eval_type, runner_conf, udf_index=i, profiler=profiler
    )
    for i in range(num_udfs)
]

My real concern is - what's the difference between read_udfs and read_all_udfs?

gaogaotiantian · 2025-12-05T01:32:37Z

python/pyspark/worker.py


+    # Read all UDFs
    num_udfs = read_int(infile)
+    udfs = []


Is there a reason you don't like the list comprehension here? It's very commonly used, significantly faster (doesn't matter here because the function itself is super slow) and use less code. Do you have concerns for readability?

oh, we could change to list comprehension. this was the old logic, I just moved it here.

Yicong-Huang · 2025-12-05T03:04:40Z

Do you need another function for it? It's basically just two lines:
num_udfs = read_int(infile)

udfs = [

    read_single_udf(

        pickleSer, infile, eval_type, runner_conf, udf_index=i, profiler=profiler

    )

    for i in range(num_udfs)

]
My real concern is - what's the difference between read_udfs and read_all_udfs?

Thanks. Either way is fine. I have changed to inline the function.

Yicong-Huang · 2025-12-05T05:46:01Z

@zhengruifeng could you please also review this? thanks

zhengruifeng · 2025-12-05T07:28:20Z

merged to master

refactor: extract logic of read_all_udfs

46d3682

github-actions bot added CORE PYTHON labels Dec 5, 2025

doc: remove a comment

2297bb7

dongjoon-hyun changed the title ~~[SPARK-54598] Extract logic of read_all_udfs~~ [SPARK-54598][PYTHON] Extract logic of read_all_udfs Dec 5, 2025

refactor: remove function

2a627ea

Yicong-Huang changed the title ~~[SPARK-54598][PYTHON] Extract logic of read_all_udfs~~ [SPARK-54598][PYTHON] Extract logic to read udfs Dec 5, 2025

Yicong-Huang changed the title ~~[SPARK-54598][PYTHON] Extract logic to read udfs~~ [SPARK-54598][PYTHON] Extract logic to read UDFs Dec 5, 2025

gaogaotiantian reviewed Dec 5, 2025

View reviewed changes

Yicong-Huang added 2 commits December 4, 2025 21:12

refactor: use list comprehension

8dd6d15

fix: format

8723aa7

gaogaotiantian approved these changes Dec 5, 2025

View reviewed changes

zhengruifeng closed this in bba8bb8 Dec 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54598][PYTHON] Extract logic to read UDFs #53330

[SPARK-54598][PYTHON] Extract logic to read UDFs #53330

Yicong-Huang commented Dec 5, 2025 •

edited

Loading

Uh oh!

gaogaotiantian commented Dec 5, 2025 •

edited

Loading

Uh oh!

gaogaotiantian Dec 5, 2025

Uh oh!

Yicong-Huang Dec 5, 2025

Uh oh!

Yicong-Huang commented Dec 5, 2025

Uh oh!

Yicong-Huang commented Dec 5, 2025

Uh oh!

zhengruifeng commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-54598][PYTHON] Extract logic to read UDFs #53330

[SPARK-54598][PYTHON] Extract logic to read UDFs #53330

Conversation

Yicong-Huang commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gaogaotiantian commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaogaotiantian Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang commented Dec 5, 2025

Uh oh!

Yicong-Huang commented Dec 5, 2025

Uh oh!

zhengruifeng commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yicong-Huang commented Dec 5, 2025 •

edited

Loading

gaogaotiantian commented Dec 5, 2025 •

edited

Loading