Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enums support #1: base types and operators #14728

Merged
merged 4 commits into from
Aug 31, 2020

Conversation

daniel-ohayon
Copy link
Contributor

@daniel-ohayon daniel-ohayon commented Jun 26, 2020

This PR is the first in a series that aims to introduce support for user-defined types, and specifically user-defined enums, into Presto (see #14691 for an overview of the work).

This one focuses specifically on introducing the base types and generic operators that apply to enums.
We support

  • enum literals, eg Mood.HAPPY
  • cast to and from base types
  • =, != and IS_DISTINCT
  • hash operator used for IN (...) and APPROX_DISTINCT()
  • cast to and from JSON
  • comparisons and ordering (using the rules of the underlying types)

Implementation notes

  • Enum literals like Mood.HAPPY are parsed as Dereferences in the AST (like my_table.my_col) and they are then rewritten as EnumLiterals in the TranslationMap's rewriter.
  • I added a new type bound constraint to express that we want a given type to be a long enum or varchar enum, so that we can have enum operator signatures like <T extends LongEnumType> equals(T, T): bool . This constraint is similar to the orderable and comparable constraint we use to define the type signature of functions like the equal operator on arrays.

In future PRs, I will introduce

  • the ability to register enum types from plugins
  • the ability to serialize enum data to the client

@daniel-ohayon
Copy link
Contributor Author

@rongrong I addressed applicable comments from #14691 here

@kaikalur
Copy link
Contributor

I'm not sure we should allow literals. They should only be a cast like: MOOD('Happy'). We don't want to create C++-like syntax where you need to know if something is a type before parsing. This is a going to create a lot of hassles and bug-prone.

@daniel-ohayon
Copy link
Contributor Author

I'm not sure we should allow literals. They should only be a cast like: MOOD('Happy').

@kaikalur are you suggesting that we introduce a UDF with the same name as each enum type that takes in a varchar key and returns the matching enum object?

We don't want to create C++-like syntax where you need to know if something is a type before parsing. This is a going to create a lot of hassles and bug-prone.

Can you elaborate on this? How is that syntax different from what we already support with eg DATE '2018-01-05' and JSON '{"hello" 1}' ? what new bugs and hassles would it introduce that these two don't?

@kaikalur
Copy link
Contributor

kaikalur commented Jul 17, 2020

I'm not sure we should allow literals. They should only be a cast like: MOOD('Happy').

@kaikalur are you suggesting that we introduce a UDF with the same name as each enum type that takes in a varchar key and returns the matching enum object?

We don't want to create C++-like syntax where you need to know if something is a type before parsing. This is a going to create a lot of hassles and bug-prone.

Can you elaborate on this? How is that syntax different from what we already support with eg DATE '2018-01-05' and JSON '{"hello" 1}' ? what new bugs and hassles would it introduce that these two don't?

DATE literal is actually a thing in the spec. And DATE is a keyword whereas here you will have to allow any enum type to precede the literal which is not very spec-like and not context-free grammar anymore.

@daniel-ohayon
Copy link
Contributor Author

I'm not too attached to any particular syntax, but the generic type literal is already there in the Presto parser, I didn't change that:
https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/sql/planner/LiteralInterpreter.java#L252

We already have the behavior of parsing type literals in a generic way when we encounter them and looking for a cast: varchar -> type so if you don't think that's a good idea, maybe that part should be removed from the parser too.

@kaikalur
Copy link
Contributor

kaikalur commented Jul 17, 2020 via email

@daniel-ohayon daniel-ohayon force-pushed the dohayon-enum-types branch 3 times, most recently from cf37c7d to cf63560 Compare July 22, 2020 03:20
@daniel-ohayon daniel-ohayon requested a review from rongrong July 22, 2020 05:27
Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add long and varchar enum types looks good. Some nits only.

@daniel-ohayon daniel-ohayon force-pushed the dohayon-enum-types branch 3 times, most recently from 11636ea to 9833151 Compare July 24, 2020 21:43
@daniel-ohayon daniel-ohayon force-pushed the dohayon-enum-types branch 3 times, most recently from 7da5d10 to 88a0f14 Compare July 27, 2020 02:43
Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to separate Add enum literals and operators to two commits: Support enum literals and Add enum operators. They are not quite related.

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Support type bound on @TypeParameter, are you planning to make the signature look like <T extends SomeType>? Please update the commit message to either use the exact term in type signature, or use generic description rather than the approximate Java syntax.

@daniel-ohayon daniel-ohayon force-pushed the dohayon-enum-types branch 2 times, most recently from 8faecd0 to 0b8e3e1 Compare July 28, 2020 15:45
String enumKey = qualifiedName.getSuffix().toLowerCase(ENGLISH);
Object enumValue = enumType.getEnumMap().get(enumKey);
if (enumValue == null) {
throw new SemanticException(INVALID_LITERAL, node, format("No key '%s' in enum '%s'", enumKey, enumType.getDisplayName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is too low level to throw an SemanticException. If you expect this function to be only called when node is a valid enum, it should throw IllegalArgumentException. You can change these logic to a checkArgument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to have errors triggered by a checkArgument surfaced to users?
This is the only place where we check whether the provided key exists in the enum, so it will be very possible that users see this error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not OK, we should add a catch at the correct level and translate the IllegalArgumentException to a SemanticException.

@daniel-ohayon
Copy link
Contributor Author

@rongrong could you have another look at Add long and varchar enum types ?
As per our conversation, I implemented full type signatures for enums containing the enum definition (with serde logic).

@daniel-ohayon daniel-ohayon force-pushed the dohayon-enum-types branch 4 times, most recently from ba08218 to dfd4d95 Compare August 19, 2020 16:34
Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure looks good overall. Feedback on some details. Please also add more tests about valid and invalid enum creation / serde.

@daniel-ohayon
Copy link
Contributor Author

addressed latest feedback and added extra tests to TestTypeSignature

@daniel-ohayon daniel-ohayon force-pushed the dohayon-enum-types branch 2 times, most recently from 90e139c to 74e8a69 Compare August 23, 2020 23:33
Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit looks good. Only some nits.

Add an is-a constraint for type variables, so we can describe generic types of a certain kind.
Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support enum literals in queries looks good. Only one nit.

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this looks so clean now!

Copy link
Contributor

@rongrong rongrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please fix the test failure and we are good to go!

Support common operators like `=` on enum types
@rongrong rongrong merged commit 99e67bc into prestodb:master Aug 31, 2020
@caithagoras caithagoras mentioned this pull request Sep 9, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants