Skip to content

Handle downstream impacts to union's behavioral changes. #12105

@wiedld

Description

@wiedld

Is your feature request related to a problem or challenge?

With this upstream change, an existing union() API had a behavioral change. This API is used to construct the UNION logical plan. Previously, it had been constructing the UNION plan node and performing type coercion. After the upstream change, it no longer perform the type coercion at logical plan construction (and instead relies upon a later logical plan optimizer pass).

This upstream change does not cause any failures in the upstream datafusion tests. However, for those constructing their own logical plans with UNION it does have significant downstream impacts. Explicitly:

  1. we have a UNION logical plan construction using union() upstream API
  2. we consume the schema & expr from the union, when deciding how to construct other logical plan nodes (e.g. sort, grouping, aggregates, limits)
  3. then afterwards => we hand the logical plan to datafusion for optimization (including type coercion)

When the upstream change removed the UNION type coercion from step 1 above, our logical plan construction code (step 2) started creating incorrect logical plans.

Describe the solution you'd like

Update the union() API docs with how to handle the changed behavior if the user still requires UNION type coercion at logical plan construction. Something like:

let union_schema = coerce_union_schema(vec![prev, next])?;
let prev = coerce_plan_expr_for_schema(prev, &union_schema)?;
let next = coerce_plan_expr_for_schema(next, &union_schema)?;
union(prev, next)

The above code relies upon the already public coerce_plan_expr_for_schema. The coerce_union_schemaI is not yet public, therefore we recommend making it public.

Describe alternatives you've considered

The original change was put in by @jonahgao , whom may have some alternative ideas?

Additional context

We have made our own public (patched) version of the coerce_union_schema as a temporary measure to fix the issue, and can confirm it's sufficiently. Also open to other ideas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions