Skip to content

Conversation

@Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Dec 5, 2025

What changes were proposed in this pull request?

This PR refactors the UDF reading logic in read_udfs() to eliminate code duplication. Currently, the logic for reading UDFs (functions and their argument offsets) is duplicated across multiple eval_type branches, with different patterns for single UDF vs. multiple UDFs cases.

Why are the changes needed?

This duplication makes the code harder to maintain and increases the risk of inconsistencies. By centralizing the UDF reading logic at the beginning of read_udfs(), we can:

  • Reduce code duplication
  • Ensure consistent UDF reading behavior across all eval types
  • Make it easier to add new eval types in the future

Does this PR introduce any user-facing change?

No, this is an internal refactoring that maintains backward compatibility. The API behavior remains the same from the user's perspective.

How was this patch tested?

Existing Tests

Was this patch authored or co-authored using generative AI tooling?

No

@gaogaotiantian
Copy link
Contributor

gaogaotiantian commented Dec 5, 2025

Do you need another function for it? It's basically just two lines:

num_udfs = read_int(infile)
udfs = [
    read_single_udf(
        pickleSer, infile, eval_type, runner_conf, udf_index=i, profiler=profiler
    )
    for i in range(num_udfs)
]

My real concern is - what's the difference between read_udfs and read_all_udfs?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-54598] Extract logic of read_all_udfs [SPARK-54598][PYTHON] Extract logic of read_all_udfs Dec 5, 2025
@Yicong-Huang Yicong-Huang changed the title [SPARK-54598][PYTHON] Extract logic of read_all_udfs [SPARK-54598][PYTHON] Extract logic to read udfs Dec 5, 2025
@Yicong-Huang Yicong-Huang changed the title [SPARK-54598][PYTHON] Extract logic to read udfs [SPARK-54598][PYTHON] Extract logic to read UDFs Dec 5, 2025

# Read all UDFs
num_udfs = read_int(infile)
udfs = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you don't like the list comprehension here? It's very commonly used, significantly faster (doesn't matter here because the function itself is super slow) and use less code. Do you have concerns for readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we could change to list comprehension. this was the old logic, I just moved it here.

@Yicong-Huang
Copy link
Contributor Author

Do you need another function for it? It's basically just two lines:

num_udfs = read_int(infile)

udfs = [

    read_single_udf(

        pickleSer, infile, eval_type, runner_conf, udf_index=i, profiler=profiler

    )

    for i in range(num_udfs)

]

My real concern is - what's the difference between read_udfs and read_all_udfs?

Thanks. Either way is fine. I have changed to inline the function.

@Yicong-Huang
Copy link
Contributor Author

@zhengruifeng could you please also review this? thanks

@zhengruifeng
Copy link
Contributor

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants