Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 'SELECT NULL null_field' and union schemas with NULL type field in coral-schema #71

Merged
merged 7 commits into from
Apr 13, 2021

Conversation

ljfgem
Copy link
Collaborator

@ljfgem ljfgem commented Apr 13, 2021

For Views created using SQL like

SELECT NULL null_field, ..., FROM table1
UNION ALL
SELECT ... FROM table2

there are 2 issues when coral-schema processes:

  1. While visiting SELECT NULL null_field, SchemaUtilities.appendField is called to append NULL to the type of null_field, but there is no check to determine if the provided null_field's type is NULL or not. This PR adds the checking before union NULL with the provided field type to avoid union 2 NULL values.
  2. After checking and leaving the type of null_field as NULL, there is no support for NULL type in SchemaUtilities.isUnionRecordSchemaCompatible and SchemaUtilities.setupNestedNamespaceForRecord. This PR also adds the NULL in the cases for these 2 functions.

Comment on lines 622 to 623
String viewSql =
"CREATE VIEW v AS SELECT NULL Null_Field FROM basecomplex UNION ALL SELECT NULL Null_Field FROM basecomplexunioncompatible";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to correctly handle SELECT NULL FROM table1 UNION ALL SELECT x FROM table2? This patch does not seem to support this case, and based on the lack of NULL support before this patch, I am not sure if this example is supported before the patch either. I expect views will use a query like this example rather than simply applying a union on two NULL values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I tested SELECT NULL FROM table1 UNION ALL SELECT Null_Field FROM table2 and coral-schema could work for it, please see the modified SQL in test file.

Copy link
Contributor

@wmoustafa wmoustafa Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was not referring to querying a field that is still of NULL type, i.e., (Null_Field), since this is equivalent to the example that exited initially. I was referring to applying a union between a field that is not of type NULL (which could be nullable or not in Avro) with a NULL field. For the NULL union NULL case, the previous test is fine.

Copy link
Collaborator Author

@ljfgem ljfgem Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If in this case, there is another exception: LogicalUnion operator are not compatible, because the types are not strictly same.
Do you think we need to support NULL union a nullable type? If so, what is the result type?

Copy link
Contributor

@wmoustafa wmoustafa Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we should verify if production views belong to this category (NULL UNION ALL NON-NULL) or the category of the test case covered by this patch NULL UNION ALL NULL. I think the former's result should be the nullable union of the non-null type. @funcheetah, what do you think?

Copy link
Collaborator Author

@ljfgem ljfgem Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just verified the production views which failed because of exception org.apache.avro.AvroRuntimeException: Duplicate in union:null.
There are 3 different types:

  1. NULL UNION ALL NULL, which could be handled by this patch
  2. NULL UNION ALL nullable_type (like ["null", "string"]), will fail
  3. NULL UNION ALL non_nullable_type (like 'int'), will fail

By the way, I think 2 and 3 are related to another exception which I will start to fix soon: java.lang.RuntimeException: Input schemas of LogicalUnion operator are not compatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! yeah, we will need to address them as well. I think the proposal above makes sense, but let us see what @funcheetah thinks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation. I think it makes sense to address 2 and 3 as well in the next fix. nullable union of the non-null type seems a reasonable result type.

Copy link
Collaborator Author

@ljfgem ljfgem Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, so I need to check it in and open another PR for exception Input schemas of LogicalUnion operator are not compatible or improve this PR to handle both exceptions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could check this in and handle 2 and 3 in another PR.

@funcheetah funcheetah merged commit d948118 into linkedin:master Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants