Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[isthmus] conversion failures with SetRels #186

Open
vbarua opened this issue Sep 21, 2023 · 1 comment
Open

[isthmus] conversion failures with SetRels #186

vbarua opened this issue Sep 21, 2023 · 1 comment

Comments

@vbarua
Copy link
Member

vbarua commented Sep 21, 2023

Under certain conditions, attempting to convert a SetRel its Calcite equivalent results in the following error:

type mismatch:
ref:
INTEGER NOT NULL
input:
INTEGER

The reason this happens requires a bit of an explanation.

Calcite derives the row type for set operations using the code here. What this does is:

  1. Compare the columns of each of the inputs to make sure they have the same type.
  2. Set the nullability of each column to be the least restrictive of its inputs. For a given column index, if any of the inputs has that column as nullable, the final output columns is nullable.

Here is an example to illustrate this:

Input 1: (I64, I64, I64?, I64?)
Input 2: (I64, I64?, I64, I64?)
Output:  (I64, I64?, I64?, I64?)

Now, substrait-java arbitrarily sets the return type of a set operation to be a return type of the first input. See here.

This is the source of the disagreement between types, but actually not enough to trigger the issue. The final item needed is a FieldReference over the set operation. As describe in #185, the FieldReferences ends up storing the type derived by substrait-java. Converting this field reference can then fail, if the type it stores does not match up with the type derived by Calcite.

Code Example

The following code reproduces the issue:

  @ParameterizedTest
  @EnumSource(Set.SetOp.class)
  void setConversionFailurs(Set.SetOp op) {
    if (op.equals(Set.SetOp.UNKNOWN)) {
      return;
    }

    SubstraitBuilder b = new SubstraitBuilder(extensions);

    TypeCreator R = TypeCreator.of(false);
    TypeCreator N = TypeCreator.of(true);

    NamedScan primaryTable =
        b.namedScan(
            List.of("primary"), List.of("a", "b", "c", "d"), List.of(R.I32, R.I32, N.I32, N.I32));

    NamedScan secondaryTable =
        b.namedScan(
            List.of("secondary"), List.of("a", "b", "c", "d"), List.of(R.I32, N.I32, R.I32, N.I32));

    var plan =
        b.project(
            input ->
                List.of(
                    // using a field reference triggers a call to deriveRecordType
                    // this embeds the type derived from the Substrait Set in the POJO
                    // if this type does not match the type Calcite expects, an exception is thrown
                    b.fieldReference(input, 0),
                    b.fieldReference(input, 1),
                    b.fieldReference(input, 2),
                    b.fieldReference(input, 3)),
            b.remap(4, 5, 6, 7),
            b.set(op, primaryTable, secondaryTable));
    assertFullRoundTrip(plan);
  }

Fixing

Similar to #185, we should consider whether we should be using the cached type at all when converting the FieldReference from Substrait to Calcite.

Specifically, here in the ExpressionRexConverter we could use the type derived by Calcite for the relation the expression sits on.

The other alternative would be align substrait type derivation with Calcite's type derivation, however it may not make sense to align with Calcite for this for everything. Adding Substrait specific variants of relational operators with our own type derivation logic is an option here.

@vbarua
Copy link
Member Author

vbarua commented Jul 1, 2024

I've created an issue in Calcite to potentially improve its type derivation.
https://issues.apache.org/jira/browse/CALCITE-6451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant