Skip to content

MongoDB: Support DBRef pushdown #22027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

nsaje
Copy link

@nsaje nsaje commented May 20, 2024

Description

This PR adds support for pushing down DBRef fields. DBRefs in MongoDB are special documents that reference another document. They are often used as "foreign keys" so not pushing them down carries a significant performance penalty (you basically load the whole table into Trino every time).

What this PR adds is translation of DBRef columns into their corresponding Mongo names:

  • databaseName -> $db
  • collectionName -> $ref
  • id -> $id

So a filter like WHERE creators.id = 'abcd' becomes something like {'creators.$id': 'abcd'} in Mongo.

Resolves #21739

Additional context and related issues

This is my first PR in Trino so I might be missing some conventions or context - happy to adjust if necessary.

I did have to change certain tests and I didn't have full context with some so leaving inline questions and hoping to discuss on this PR if what I've done is correct or not.

Release notes

(X) Release notes are required, with the following suggested text:

# MongoDB
* Improve read and filter performance when selecting or filtering on DBRef fields. ({issue}`issuenumber`)

Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added the mongodb MongoDB connector label May 20, 2024
@@ -1693,14 +1693,14 @@ private static Document getDocumentWithDifferentDbRefFieldOrder(Object objectId)
{
return new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document().append("collectionName", "doc_creators").append("id", objectId).append("databaseName", "doc_test"));
.append("creator", new Document().append("$ref", "doc_creators").append("$id", objectId).append("$db", "doc_test"));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're constructing the BSON document by hand here I changed these to actually correspond to correct MongoDB names for collectionName, databaseName etc. Otherwise these documents are in no way related to DBRefs.

}

private static Document documentWithSameDbRefFieldOrder(Object objectId)
{
return new Document()
.append("_id", new ObjectId("5126bbf64aed4daf9e2ab771"))
.append("creator", new Document().append("databaseName", "doc_test").append("collectionName", "doc_creators").append("id", objectId));
.append("creator", new Document().append("$db", "doc_test").append("$ref", "doc_creators").append("$id", objectId));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

.append("databaseName", "doc_test")
.append("collectionName", "doc_creators")
.append("id", objectId));
.append("$db", "doc_test")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

.append("databaseName", "test")
.append("collectionName", "creators")
.append("id", objectId));
.append("$db", "test")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

1 similar comment
Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@ebyhr ebyhr requested review from ebyhr, Praveen2112 and krvikash May 20, 2024 10:21
Copy link

cla-bot bot commented May 20, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Comment on lines +51 to +53
case DATABASE_NAME -> DATABASE_NAME_NATIVE;
case COLLECTION_NAME -> COLLECTION_NAME_NATIVE;
case ID -> ID_NATIVE;
Copy link
Member

@ebyhr ebyhr May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember why we disallowed DBRef. We can't distinguish DBRef from object with same field names.
This PR causes silent correctness issue if the document has an object with databaseName, collectionName and id fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr I don't think that's the case - I added a test to check this scenario.

The only time this would be a problem is if the same collection has documents that hold both actual DBRefs and DBRef-like documents in the same field.

Could we just document this possible but extremely unlikely edge case? IMO the importance of DBRef pushdown is way higher than this concern since without it you're loading the entire collection into memory when filtering or joining by a DBRef field (which makes Trino fairly useless for bigger databases in such cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added test is wrong. Please reorder creator fields as databaseName, collectionName, id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I put them in the order that mongo expects them (docs), not the order trino expects them. If I order them as you said the test does indeed fail.

Is there a way we could signal that a field is a DBRef in its type signature so that we didn't have to rely on field names & order to detect a DBRef down the line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr in order to consider DBRef pushdown do you think we should
a) find a way to support it cleanly (how? a custom type?)
b) require it to be explicitly turned on with a disclaimer that it can break on specifically formed documents?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @ebyhr - would appreciate some guidance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr What if we could allow the pushdown - if the Collection doesn't have any columns on those name ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Praveen2112 I think the problem is we can't really differentiate whether those are columns on the collection or these "system" DBRef columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see @ebyhr 's first comment in this thread.

Copy link

cla-bot bot commented May 22, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jun 28, 2024
@nsaje
Copy link
Author

nsaje commented Jul 11, 2024

Hey guys, can we get some additional guidance here? I think it's a shame to not support DBRef pushdown which is really limiting for a lot of MongoDB use-cases just because of an edge-case scenario. At least we should give the users a choice, whether to have pushdown or to support proper behavior on dbref-like-but-not-dbref documents.

@Praveen2112
Copy link
Member

I'll take a look at this by this week

@github-actions github-actions bot removed the stale label Jul 11, 2024
Copy link

github-actions bot commented Aug 5, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Aug 5, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Aug 27, 2024
@mosabua
Copy link
Member

mosabua commented Aug 27, 2024

@nsaje @Praveen2112 @ebyhr - do you want to reopen and continue this work?

@nsaje
Copy link
Author

nsaje commented Aug 28, 2024

I'd love to - just need more input from the team which way we should go.

@mosabua mosabua reopened this Aug 30, 2024
@mosabua
Copy link
Member

mosabua commented Aug 30, 2024

Can you help out with some input here @Praveen2112

@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Aug 30, 2024
@nsaje
Copy link
Author

nsaje commented Sep 11, 2024

ping @Praveen2112

@ebyhr
Copy link
Member

ebyhr commented Oct 2, 2024

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Oct 2, 2024
Copy link

cla-bot bot commented Oct 2, 2024

The cla-bot has been summoned, and re-checked this pull request!

@nsaje
Copy link
Author

nsaje commented Dec 2, 2024

@ebyhr @Praveen2112 any ideas for the direction of this?

@nsaje nsaje requested a review from ebyhr December 2, 2024 05:49
@nsaje
Copy link
Author

nsaje commented May 27, 2025

@ebyhr would we be ok with adding this behind a "mongodb.db-ref-pushdown-enabled" config option that defaults to false?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed mongodb MongoDB connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

MongoDB DBRef pushdown
4 participants