Skip to content

UNION should propagate identical input attributes to its output #270

Open
@wajda

Description

Consider the following case:

           (a, b, c)
            /     \ 
           .       .
       SELECT a=   .
           .       .
           .     SELECT b=
           .       .
            \     /
             UNION
               |
          (a', b', c')

The UNION combines two data frames that are created from the same shared data frame (a, b, c) by applying different transformation chains.

To preserve both left and right paths in the resulting (a', b', c') data frame lineage we create new synthetic attributes for the UNION output, and connecting those attributes to all corresponding attributes in the input data frames.

The problem arises when some of those attributes are identical, e.g. in the above example, attribute c isn't touched and is simply propagated everywhere. Thus in the resulting data frame the attribute c is expected to be presented itself, not c' = f(c) as Spline currently shows.

Technically speaking c' = f(c) is not particularly incorrect representation, as f could be an identity function so the expression boils down to just the c' = c as expected. But it just creates unnecessary clutter in the graph structure with duplicated intermediate attribute names making it difficult to reason about.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    • Status

      New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions