-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Support for user-defined enums #14691
Conversation
@@ -69,7 +69,10 @@ private FixJsonDataUtils() {} | |||
} | |||
requireNonNull(columns, "columns is null"); | |||
List<TypeSignature> signatures = columns.stream() | |||
.map(column -> parseTypeSignature(column.getType())) | |||
.map(column -> { | |||
TypeSignature displayType = column.getTypeMetadata().getDisplayType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this file, we use the type signature client-side to determine what kind of data we're dealing with.
Since the client doesn't know about the enum types, if it encounters a String value, it will default to base64 encoding as per:
if (value instanceof String) {
return Base64.getDecoder().decode((String) value);
}
For string-valued enums, this results in gibberish in the CLI output.
This is why I'm passing this extra info about the display type
if (rowBuffer.size() >= MAX_BUFFERED_ROWS) { | ||
flush(false); | ||
} | ||
} | ||
|
||
private List<?> prettyPrintRow(List<?> row, List<Column> columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this doesn't work with complex types (row, map, array...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebuild switch-case logic here
@Override | ||
public boolean isComparable() | ||
{ | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum types are not comparable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be, but I haven't found a way to register the comparison functions dynamically (like we do for CAST) in a way that I'm happy with yet.
@@ -685,6 +685,16 @@ private boolean isMoreSpecificThan(ApplicableFunction left, ApplicableFunction r | |||
return boundVariables.isPresent(); | |||
} | |||
|
|||
// Called by TypeRegistry to look up types from FunctionNamespaceManagers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: move to TypeRegistry
import java.util.Map; | ||
import java.util.stream.Collectors; | ||
|
||
public class IntegerEnumType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we call it LongEnumType
so there's no confusion? Integer can also mean specifically the SQL INTEGER type which is 32-bit (though using 32-bit is probably enough for an enum type so that would be an option as well?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's go for LongEnumType!
public interface EnumType<T> | ||
extends Type | ||
{ | ||
Map<String, T> getEntries(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getEntries
sounds like it should return a list. Maybe explicitly call it getEnumNameToValueMap
or just getEnumMap
or getMap
.
@@ -23,12 +23,12 @@ | |||
|
|||
import static java.util.Collections.singletonList; | |||
|
|||
public final class VarcharType | |||
public class VarcharType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to refactor VarcharType
into AbstractVarcharType
and VarcharType
(similar to AbstractLongType
and BigintType
) and keep this as final. That way you also avoid the weird need to provide a type signature for the constructor.
@@ -901,6 +904,14 @@ private SpecializedFunctionKey doGetSpecializedFunctionKey(Signature signature) | |||
Iterable<SqlFunction> candidates = getFunctions(null, signature.getName()); | |||
// search for exact match | |||
Type returnType = typeManager.getType(signature.getReturnType()); | |||
|
|||
if (returnType instanceof EnumType && CastType.CAST.getCastName().equals(signature.getName())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this logic in BuiltInFunctionNamespaceManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reorganized it a bit in #14728 to have the enum-specific logic only live in EnumOperators (but we still have a call from BuiltinFunctionNamespaceManager - I think it makes sense to have logic there to handle those UDT functions on-the-fly since it's a "core" feature which is independent of any specific UDT).
name, | ||
type.getTypeSignature().toString(), | ||
new ClientTypeSignature(type.getTypeSignature()), | ||
type.getTypeMetadata()); | ||
} | ||
|
||
public Column(String name, TypeSignature signature) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what you can do: remove this constructor (Column(String name, TypeSignature signature)
). There's no real use case for this one. Create a Type
class in client (I think call it Type
is fine, but if it feels too confusing we can call it ClientType
), which wraps the string type, client type signature and enum maps.
if (rowBuffer.size() >= MAX_BUFFERED_ROWS) { | ||
flush(false); | ||
} | ||
} | ||
|
||
private List<?> prettyPrintRow(List<?> row, List<Column> columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just call this function processRow
or transform
.
The goal of this PR is to enable basic support for enum user-defined types (UDTs) in Presto. Enums already exist in Postgres and MySQL, and this would be a first step toward generic UDT support.
Requirements
Mood 'HAPPY'
my_enum_column = 4
is OK, butmy_enum_column = my_int_column
is not. This would be in line with enum behavior in languages like Hack, and it would facilitate backwards-compatibility of existing queries that do not use enum literalsImplementation
TODO
[ ] Disambiguate key lookup from value lookup with string enums
[ ] Persist enum info to schema (separate PR)
[ ] Support =, != for enums
[ ] Support literal comparison
[ ] Move getTypeDynamically to TypeRegistry