Skip to content

MongoDB: Support DBRef pushdown #22027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@
import java.util.List;
import java.util.Optional;

import static io.trino.plugin.mongodb.MongoSession.COLLECTION_NAME;
import static io.trino.plugin.mongodb.MongoSession.COLLECTION_NAME_NATIVE;
import static io.trino.plugin.mongodb.MongoSession.DATABASE_NAME;
import static io.trino.plugin.mongodb.MongoSession.DATABASE_NAME_NATIVE;
import static io.trino.plugin.mongodb.MongoSession.ID;
import static io.trino.plugin.mongodb.MongoSession.ID_NATIVE;
import static java.util.Objects.requireNonNull;

/**
Expand All @@ -35,9 +41,26 @@ public record MongoColumnHandle(String baseName, List<String> dereferenceNames,
public MongoColumnHandle
{
requireNonNull(baseName, "baseName is null");
dereferenceNames = ImmutableList.copyOf(requireNonNull(dereferenceNames, "dereferenceNames is null"));
requireNonNull(dereferenceNames, "dereferenceNames is null");
requireNonNull(type, "type is null");
requireNonNull(comment, "comment is null");

if (dbRefField) {
String leafColumnName = dereferenceNames.getLast();
String leafDBRefNativeName = switch (leafColumnName) {
case DATABASE_NAME -> DATABASE_NAME_NATIVE;
case COLLECTION_NAME -> COLLECTION_NAME_NATIVE;
case ID -> ID_NATIVE;
Comment on lines +51 to +53
Copy link
Member

@ebyhr ebyhr May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember why we disallowed DBRef. We can't distinguish DBRef from object with same field names.
This PR causes silent correctness issue if the document has an object with databaseName, collectionName and id fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr I don't think that's the case - I added a test to check this scenario.

The only time this would be a problem is if the same collection has documents that hold both actual DBRefs and DBRef-like documents in the same field.

Could we just document this possible but extremely unlikely edge case? IMO the importance of DBRef pushdown is way higher than this concern since without it you're loading the entire collection into memory when filtering or joining by a DBRef field (which makes Trino fairly useless for bigger databases in such cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added test is wrong. Please reorder creator fields as databaseName, collectionName, id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I put them in the order that mongo expects them (docs), not the order trino expects them. If I order them as you said the test does indeed fail.

Is there a way we could signal that a field is a DBRef in its type signature so that we didn't have to rely on field names & order to detect a DBRef down the line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr in order to consider DBRef pushdown do you think we should
a) find a way to support it cleanly (how? a custom type?)
b) require it to be explicitly turned on with a disclaimer that it can break on specifically formed documents?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @ebyhr - would appreciate some guidance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr What if we could allow the pushdown - if the Collection doesn't have any columns on those name ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Praveen2112 I think the problem is we can't really differentiate whether those are columns on the collection or these "system" DBRef columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see @ebyhr 's first comment in this thread.

default -> leafColumnName;
};
dereferenceNames = ImmutableList.<String>builder()
.addAll(dereferenceNames.subList(0, dereferenceNames.size() - 1))
.add(leafDBRefNativeName)
.build();
}
else {
dereferenceNames = ImmutableList.copyOf(dereferenceNames);
}
}

public ColumnMetadata toColumnMetadata()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -723,9 +723,6 @@ private static boolean isSupportedForPushdown(ConnectorExpression connectorExpre
}
if (connectorExpression instanceof FieldDereference fieldDereference) {
RowType rowType = (RowType) fieldDereference.getTarget().getType();
if (isDBRefField(rowType)) {
return false;
}
Field field = rowType.getFields().get(fieldDereference.getField());
if (field.getName().isEmpty()) {
return false;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,11 @@
import static io.airlift.slice.Slices.wrappedBuffer;
import static io.trino.plugin.base.util.JsonTypeUtil.jsonParse;
import static io.trino.plugin.mongodb.MongoSession.COLLECTION_NAME;
import static io.trino.plugin.mongodb.MongoSession.COLLECTION_NAME_NATIVE;
import static io.trino.plugin.mongodb.MongoSession.DATABASE_NAME;
import static io.trino.plugin.mongodb.MongoSession.DATABASE_NAME_NATIVE;
import static io.trino.plugin.mongodb.MongoSession.ID;
import static io.trino.plugin.mongodb.MongoSession.ID_NATIVE;
import static io.trino.plugin.mongodb.ObjectIdType.OBJECT_ID;
import static io.trino.plugin.mongodb.TypeUtils.isJsonType;
import static io.trino.spi.StandardErrorCode.GENERIC_INTERNAL_ERROR;
Expand Down Expand Up @@ -416,9 +419,9 @@ private static Object getDbRefValue(DBRef dbRefValue, MongoColumnHandle columnHa
checkState(!dereferenceNames.isEmpty(), "dereferenceNames is empty");
String leafColumnName = dereferenceNames.getLast();
return switch (leafColumnName) {
case DATABASE_NAME -> dbRefValue.getDatabaseName();
case COLLECTION_NAME -> dbRefValue.getCollectionName();
case ID -> dbRefValue.getId();
case DATABASE_NAME_NATIVE -> dbRefValue.getDatabaseName();
case COLLECTION_NAME_NATIVE -> dbRefValue.getCollectionName();
case ID_NATIVE -> dbRefValue.getId();
default -> throw new IllegalStateException("Unsupported DBRef column name: " + leafColumnName);
};
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,9 @@ public class MongoSession
public static final String DATABASE_NAME = "databaseName";
public static final String COLLECTION_NAME = "collectionName";
public static final String ID = "id";
public static final String DATABASE_NAME_NATIVE = "$db";
public static final String COLLECTION_NAME_NATIVE = "$ref";
public static final String ID_NATIVE = "$id";

// The 'simple' locale is the default collection in MongoDB. The locale doesn't allow specifying other fields (e.g. numericOrdering)
// https://www.mongodb.com/docs/manual/reference/collation/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1536,7 +1536,7 @@ private void testProjectionPushdownWithDBRef(Object objectId, String expectedVal

assertThat(query("SELECT parent.child, creator.databaseName, creator.collectionName, creator.id FROM test." + tableName))
.matches("SELECT " + expectedValue + ", varchar 'test', varchar 'creators', " + expectedValue)
.isNotFullyPushedDown(ProjectNode.class);
.isFullyPushedDown();
assertQuery(
"SELECT typeof(creator) FROM test." + tableName,
"SELECT 'row(databaseName varchar, collectionName varchar, id " + expectedType + ")'");
Expand Down Expand Up @@ -1573,7 +1573,7 @@ private void testProjectionPushdownWithNestedDBRef(Object objectId, String expec

assertThat(query("SELECT parent.child, parent.creator.databaseName, parent.creator.collectionName, parent.creator.id FROM test." + tableName))
.matches("SELECT " + expectedValue + ", varchar 'test', varchar 'creators', " + expectedValue)
.isNotFullyPushedDown(ProjectNode.class);
.isFullyPushedDown();
assertQuery(
"SELECT typeof(parent.creator) FROM test." + tableName,
"SELECT 'row(databaseName varchar, collectionName varchar, id " + expectedType + ")'");
Expand Down Expand Up @@ -1609,7 +1609,7 @@ private void testProjectionPushdownWithPredefinedDBRefKeyword(Object objectId, S
assertThat(query("SELECT parent.id, parent.id.id FROM test." + tableName))
.skippingTypesCheck()
.matches("SELECT row('test', 'creators', %1$s), %1$s".formatted(expectedValue))
.isNotFullyPushedDown(ProjectNode.class);
.isFullyPushedDown();
assertQuery(
"SELECT typeof(parent.id), typeof(parent.id.id) FROM test." + tableName,
"SELECT 'row(databaseName varchar, collectionName varchar, id %1$s)', '%1$s'".formatted(expectedType));
Expand Down Expand Up @@ -1679,12 +1679,12 @@ private void testDBRefLikeDocument(Document document1, Document document2, Strin
assertThat(query("SELECT creator.id FROM test." + tableName))
.skippingTypesCheck()
.matches("VALUES (%1$s), (%1$s)".formatted(expectedValue))
.isNotFullyPushedDown(ProjectNode.class);
.isFullyPushedDown();

assertThat(query("SELECT creator.databasename, creator.collectionname, creator.id FROM test." + tableName))
.skippingTypesCheck()
.matches("VALUES ('doc_test', 'doc_creators', %1$s), ('dbref_test', 'dbref_creators', %1$s)".formatted(expectedValue))
.isNotFullyPushedDown(ProjectNode.class);
.isFullyPushedDown();

assertUpdate("DROP TABLE test." + tableName);
}
Expand All @@ -1693,14 +1693,14 @@ private static Document getDocumentWithDifferentDbRefFieldOrder(Object objectId)
{
return new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document().append("collectionName", "doc_creators").append("id", objectId).append("databaseName", "doc_test"));
.append("creator", new Document().append("$ref", "doc_creators").append("$id", objectId).append("$db", "doc_test"));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're constructing the BSON document by hand here I changed these to actually correspond to correct MongoDB names for collectionName, databaseName etc. Otherwise these documents are in no way related to DBRefs.

}

private static Document documentWithSameDbRefFieldOrder(Object objectId)
{
return new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document().append("databaseName", "doc_test").append("collectionName", "doc_creators").append("id", objectId));
.append("creator", new Document().append("$db", "doc_test").append("$ref", "doc_creators").append("$id", objectId));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

}

private static Document dbRefDocument(Object objectId)
Expand All @@ -1717,9 +1717,9 @@ private void testDBRefLikeDocument(Object objectId, String expectedValue)
Document documentWithDifferentDbRefFieldOrder = new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document()
.append("databaseName", "doc_test")
.append("collectionName", "doc_creators")
.append("id", objectId));
.append("$db", "doc_test")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

.append("$ref", "doc_creators")
.append("$id", objectId));
Document dbRefDocument = new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab772"))
.append("creator", new DBRef("dbref_test", "dbref_creators", objectId));
Expand All @@ -1743,6 +1743,38 @@ private void testDBRefLikeDocument(Object objectId, String expectedValue)
assertUpdate("DROP TABLE test." + tableName);
}

@Test
public void testDBRefLikeDocumentWithTrinoLikeFieldNames()
{
String objectId = "test_id";
String expectedValue = "varchar 'test_id'";
String tableName = "test_dbref_like_document_" + randomNameSuffix();

// DbRef-like document but with field names as they're used in Trino
Document dbRefLikeDocument = new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document()
.append("collectionName", "doc_creators")
.append("id", objectId)
.append("databaseName", "doc_test"));
client.getDatabase("test").getCollection(tableName).insertOne(dbRefLikeDocument);

assertThat(query("SELECT * FROM test." + tableName))
.skippingTypesCheck()
.matches("VALUES "
+ " row(row('doc_creators', " + expectedValue + ", 'doc_test'))");

assertThat(query("SELECT creator.id FROM test." + tableName))
.skippingTypesCheck()
.matches("VALUES " + "(%1$s)".formatted(expectedValue));

assertThat(query("SELECT creator.collectionname, creator.id, creator.databasename FROM test." + tableName))
.skippingTypesCheck()
.matches("VALUES " + "('doc_creators', %1$s, 'doc_test')".formatted(expectedValue));

assertUpdate("DROP TABLE test." + tableName);
}

@Test
public void testPredicateOnDBRefField()
{
Expand All @@ -1766,12 +1798,12 @@ private void testPredicateOnDBRefField(Object objectId, String expectedValue)
assertThat(query("SELECT * FROM test." + tableName + " WHERE creator.id = " + expectedValue))
.skippingTypesCheck()
.matches("SELECT ROW(varchar 'test', varchar 'creators', " + expectedValue + ")")
.isNotFullyPushedDown(FilterNode.class);
.isFullyPushedDown();

assertThat(query("SELECT creator.id FROM test." + tableName + " WHERE creator.id = " + expectedValue))
.skippingTypesCheck()
.matches("SELECT " + expectedValue)
.isNotFullyPushedDown(FilterNode.class);
.isFullyPushedDown();

assertUpdate("DROP TABLE test." + tableName);
}
Expand All @@ -1793,21 +1825,21 @@ private void testPredicateOnDBRefLikeDocument(Object objectId, String expectedVa
Document document = new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document()
.append("databaseName", "test")
.append("collectionName", "creators")
.append("id", objectId));
.append("$db", "test")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

.append("$ref", "creators")
.append("$id", objectId));

client.getDatabase("test").getCollection(tableName).insertOne(document);

assertThat(query("SELECT * FROM test." + tableName + " WHERE creator.id = " + expectedValue))
.skippingTypesCheck()
.matches("SELECT ROW(varchar 'test', varchar 'creators', " + expectedValue + ")")
.isNotFullyPushedDown(FilterNode.class);
.isFullyPushedDown();

assertThat(query("SELECT creator.id FROM test." + tableName + " WHERE creator.id = " + expectedValue))
.skippingTypesCheck()
.matches("SELECT " + expectedValue)
.isNotFullyPushedDown(FilterNode.class);
.isFullyPushedDown();

assertUpdate("DROP TABLE test." + tableName);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ public void testRoundTripWithProjectedColumns()
false,
false,
Optional.empty()),
new MongoColumnHandle("creator", ImmutableList.of("databasename"), VARCHAR, false, true, Optional.empty()));
new MongoColumnHandle("creator", ImmutableList.of("databaseName"), VARCHAR, false, true, Optional.empty()));

MongoTableHandle expected = new MongoTableHandle(
schemaTableName,
Expand Down
Loading