-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support 'SELECT NULL null_field' and union schemas with NULL type field in coral-schema #71
Conversation
String viewSql = | ||
"CREATE VIEW v AS SELECT NULL Null_Field FROM basecomplex UNION ALL SELECT NULL Null_Field FROM basecomplexunioncompatible"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to correctly handle SELECT NULL FROM table1 UNION ALL SELECT x FROM table2
? This patch does not seem to support this case, and based on the lack of NULL
support before this patch, I am not sure if this example is supported before the patch either. I expect views will use a query like this example rather than simply applying a union on two NULL
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I tested SELECT NULL FROM table1 UNION ALL SELECT Null_Field FROM table2
and coral-schema could work for it, please see the modified SQL in test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was not referring to querying a field that is still of NULL
type, i.e., (Null_Field)
, since this is equivalent to the example that exited initially. I was referring to applying a union between a field that is not of type NULL
(which could be nullable or not in Avro) with a NULL
field. For the NULL
union NULL
case, the previous test is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If in this case, there is another exception: LogicalUnion operator are not compatible, because the types are not strictly same.
Do you think we need to support NULL
union a nullable type? If so, what is the result type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, we should verify if production views belong to this category (NULL UNION ALL NON-NULL
) or the category of the test case covered by this patch NULL UNION ALL NULL
. I think the former's result should be the nullable union of the non-null type. @funcheetah, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just verified the production views which failed because of exception org.apache.avro.AvroRuntimeException: Duplicate in union:null
.
There are 3 different types:
NULL UNION ALL NULL
, which could be handled by this patchNULL UNION ALL nullable_type (like ["null", "string"])
, will failNULL UNION ALL non_nullable_type (like 'int')
, will fail
By the way, I think 2 and 3 are related to another exception which I will start to fix soon: java.lang.RuntimeException: Input schemas of LogicalUnion operator are not compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! yeah, we will need to address them as well. I think the proposal above makes sense, but let us see what @funcheetah thinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the investigation. I think it makes sense to address 2 and 3 as well in the next fix. nullable union of the non-null type
seems a reasonable result type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, so I need to check it in and open another PR for exception Input schemas of LogicalUnion operator are not compatible
or improve this PR to handle both exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could check this in and handle 2 and 3 in another PR.
For Views created using SQL like
there are 2 issues when coral-schema processes:
SELECT NULL null_field
,SchemaUtilities.appendField
is called to appendNULL
to the type ofnull_field
, but there is no check to determine if the providednull_field
's type is NULL or not. This PR adds the checking before union NULL with the provided field type to avoid union 2 NULL values.null_field
as NULL, there is no support for NULL type inSchemaUtilities.isUnionRecordSchemaCompatible
andSchemaUtilities.setupNestedNamespaceForRecord
. This PR also adds the NULL in the cases for these 2 functions.