You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Under certain conditions, attempting to convert a SetRel its Calcite equivalent results in the following error:
type mismatch:
ref:
INTEGER NOT NULL
input:
INTEGER
The reason this happens requires a bit of an explanation.
Calcite derives the row type for set operations using the code here. What this does is:
Compare the columns of each of the inputs to make sure they have the same type.
Set the nullability of each column to be the least restrictive of its inputs. For a given column index, if any of the inputs has that column as nullable, the final output columns is nullable.
Now, substrait-java arbitrarily sets the return type of a set operation to be a return type of the first input. See here.
This is the source of the disagreement between types, but actually not enough to trigger the issue. The final item needed is a FieldReference over the set operation. As describe in #185, the FieldReferences ends up storing the type derived by substrait-java. Converting this field reference can then fail, if the type it stores does not match up with the type derived by Calcite.
Code Example
The following code reproduces the issue:
@ParameterizedTest@EnumSource(Set.SetOp.class)
voidsetConversionFailurs(Set.SetOpop) {
if (op.equals(Set.SetOp.UNKNOWN)) {
return;
}
SubstraitBuilderb = newSubstraitBuilder(extensions);
TypeCreatorR = TypeCreator.of(false);
TypeCreatorN = TypeCreator.of(true);
NamedScanprimaryTable =
b.namedScan(
List.of("primary"), List.of("a", "b", "c", "d"), List.of(R.I32, R.I32, N.I32, N.I32));
NamedScansecondaryTable =
b.namedScan(
List.of("secondary"), List.of("a", "b", "c", "d"), List.of(R.I32, N.I32, R.I32, N.I32));
varplan =
b.project(
input ->
List.of(
// using a field reference triggers a call to deriveRecordType// this embeds the type derived from the Substrait Set in the POJO// if this type does not match the type Calcite expects, an exception is thrownb.fieldReference(input, 0),
b.fieldReference(input, 1),
b.fieldReference(input, 2),
b.fieldReference(input, 3)),
b.remap(4, 5, 6, 7),
b.set(op, primaryTable, secondaryTable));
assertFullRoundTrip(plan);
}
Fixing
Similar to #185, we should consider whether we should be using the cached type at all when converting the FieldReference from Substrait to Calcite.
Specifically, here in the ExpressionRexConverter we could use the type derived by Calcite for the relation the expression sits on.
The other alternative would be align substrait type derivation with Calcite's type derivation, however it may not make sense to align with Calcite for this for everything. Adding Substrait specific variants of relational operators with our own type derivation logic is an option here.
The text was updated successfully, but these errors were encountered:
Under certain conditions, attempting to convert a SetRel its Calcite equivalent results in the following error:
The reason this happens requires a bit of an explanation.
Calcite derives the row type for set operations using the code here. What this does is:
Here is an example to illustrate this:
Now, substrait-java arbitrarily sets the return type of a set operation to be a return type of the first input. See here.
This is the source of the disagreement between types, but actually not enough to trigger the issue. The final item needed is a FieldReference over the set operation. As describe in #185, the FieldReferences ends up storing the type derived by substrait-java. Converting this field reference can then fail, if the type it stores does not match up with the type derived by Calcite.
Code Example
The following code reproduces the issue:
Fixing
Similar to #185, we should consider whether we should be using the cached type at all when converting the FieldReference from Substrait to Calcite.
Specifically, here in the ExpressionRexConverter we could use the type derived by Calcite for the relation the expression sits on.
The other alternative would be align substrait type derivation with Calcite's type derivation, however it may not make sense to align with Calcite for this for everything. Adding Substrait specific variants of relational operators with our own type derivation logic is an option here.
The text was updated successfully, but these errors were encountered: