Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mixed Case Pinot Tables #7630

Merged
merged 1 commit into from
Apr 15, 2022
Merged

Conversation

elonazoulay
Copy link
Member

Extracted from the pinot insert pr.

Fixes #6789

@cla-bot cla-bot bot added the cla-signed label Apr 18, 2021
@elonazoulay elonazoulay changed the title [WIP] Mixed case [WIP] Support mixed case pinot tables Apr 18, 2021
@elonazoulay
Copy link
Member Author

TODO: adding tests.

@elonazoulay elonazoulay requested review from findepi and kokosing April 18, 2021 17:43
@elonazoulay elonazoulay changed the title [WIP] Support mixed case pinot tables [WIP] Support Mixed Case Pinot Tables Apr 18, 2021
@elonazoulay elonazoulay force-pushed the mixed_case branch 3 times, most recently from 7bd7758 to 393735f Compare August 16, 2021 06:04
@elonazoulay elonazoulay changed the title [WIP] Support Mixed Case Pinot Tables Support Mixed Case Pinot Tables Aug 22, 2021
@elonazoulay elonazoulay force-pushed the mixed_case branch 7 times, most recently from 3e65426 to dd1789b Compare August 30, 2021 06:46
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

But I have concerns about the implementation so I didn't review the entire PR. The current implementation ties the remote name cache to the metadata cache and doesn't provide a way to disable the case-insensitive matching which will lead to failures when a Pinot catalog contains colliding table names without any solution other than renaming the offending tables (which may not always be possible).

I'd recommend something similar to how it's implemented in trino-base-jdbc - look at BaseJdbcClient#toRemoteTableName usages.
There's a separate cache that maps Trino table names (i.e. always lowercase) to the actual name in the remote system. This cache is used or not depending on a config so that the feature can be disabled.

// Pinot does not support array literals
if (((PinotColumnHandle) entry.getKey()).getDataType() instanceof ArrayType) {
// Pinot does not fully support varbinary null literals (empty byte array): https://github.com/apache/pinot/issues/4230
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test detect this when this gets fixed on Pinot side? e.g. asserting that incorrect results are returned for filter on a varbinary columns with an actually empty array and a null value?

}
catch (TableNotFoundException e) {
log.debug(e, "Table not found: %s", tableName);
String pinotTableName = pinotClient.getPinotTableNameFromTrinoTableNameIfExists(tableName.getTableName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be helpful to use a name like toRemoteTableName and toRemoteSchemaName since many other connectors use those method names and it'll make it easier for readers to figure out what this is trying to do.

I don't have a strong opinion on this though.

@@ -187,11 +177,11 @@ public ConnectorTableMetadata getTableMetadata(ConnectorSession session, Connect
@Override
public List<SchemaTableName> listTables(ConnectorSession session, Optional<String> schemaNameOrNull)
{
ImmutableList.Builder<SchemaTableName> builder = ImmutableList.builder();
for (String table : getPinotTableNames()) {
ImmutableSet.Builder<SchemaTableName> builder = ImmutableSet.builder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this silently swallow colliding names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will push an update shortly that returns colliding table names. It is based off of #9098.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed. This is based off of #9098

@elonazoulay
Copy link
Member Author

Rebased off of #9781 .

@elonazoulay elonazoulay force-pushed the mixed_case branch 5 times, most recently from 4edf88a to 6180aa9 Compare March 19, 2022 18:22
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at last commit since others were merged in #9781.

Is it possible to make this controlled via a toggle? That way once quoted-identifiers are supported it'll be much easier to migrate to that instead of having to hunt the code for places which would need changing.

Also, some named abstraction for Pinot name would be helpful to indicate which places in code expect Trino names vs Pinot names. This became very confusing in Jdbc based connectors in the past.

@@ -239,9 +256,14 @@ public GetTables(@JsonProperty("tables") List<String> tables)
}
}

public List<String> getAllTables()
protected ListMultimap<String, String> getAllTables()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare as Multimap even if impl is ListMultimap.

@ebyhr ebyhr removed their request for review March 25, 2022 01:43
@elonazoulay elonazoulay requested a review from hashhar April 14, 2022 19:07
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll merge once CI is done.

@hashhar hashhar merged commit 87515b0 into trinodb:master Apr 15, 2022
@hashhar hashhar mentioned this pull request Apr 15, 2022
@github-actions github-actions bot added this to the 378 milestone Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Trino + Pinot Connector - Lowercasing table name in query
2 participants