[mongo] parallel snapshot for ObjectID key #3293

heavycrystal · 2025-07-30T19:45:34Z

Need to give WatermarkColumn as input, otherwise falls back to full table partition.
Only ObjectID partitioning is supported, so it is intended for use with _id but should be general.

Also added "adjusted partitions" logic to MySQL

TODO testing

Copilot

Pull Request Overview

This PR implements parallel snapshot support for MongoDB ObjectID keys by adding partitioning functionality to the MongoDB connector. The implementation allows for parallel data extraction when a WatermarkColumn is specified, falling back to full table partition mode otherwise.

Adds ObjectID partition range comparison support in the utils package
Implements MongoDB-specific partitioning using MongoDB's $bucketAuto aggregation
Updates logging consistency across PostgreSQL and MySQL connectors

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
flow/connectors/utils/partition.go	Adds ObjectID comparison logic for partition ranges
flow/connectors/postgres/qrep.go	Updates log message to include "[postgres]" prefix
flow/connectors/mysql/qrep.go	Standardizes partition calculation and fixes typo
flow/connectors/mongo/qrep.go	Implements parallel partitioning with ObjectID support

Copilot · 2025-07-30T19:46:23Z

flow/connectors/utils/partition.go

@@ -98,6 +99,30 @@ func comparePartitionRanges(
 			return c
 		}
 		return cmp.Compare(prevTuple.OffsetNumber, currTuple.OffsetNumber)
+		// we can compare ObjectIDs, but not sure if doing this is correct


This comment expresses uncertainty about the correctness of the implementation. Either remove the comment if the implementation is correct, or address the uncertainty with proper validation.

Suggested change

// we can compare ObjectIDs, but not sure if doing this is correct

// Comparing ObjectIDs using bytes.Compare is valid because ObjectIDs are 12-byte values

// that can be compared lexicographically to determine their order.

flow/connectors/mongo/qrep.go

jgao54 · 2025-08-05T18:11:37Z

flow/connectors/mysql/qrep.go

-	if totalRows%numRowsPerPartition != 0 {
-		numPartitions++
-	}
+	adjustedPartitions := shared.AdjustNumPartitions(totalRows, numRowsPerPartition)


ooc, is the main difference here limiting maxPartitions to 1000?

yes, and adjusting numRowsPerPartition accordingly

jgao54 · 2025-08-05T18:14:29Z

flow/connectors/utils/partition.go

@@ -356,8 +341,6 @@ func (p *PartitionHelper) getPartitionForStartAndEnd(start any, end any) (*proto
 		return createTimePartition(v, end.(time.Time)), nil
 	case pgtype.TID:
 		return createTIDPartition(v, end.(pgtype.TID)), nil
-	case bson.ObjectID:


do we not want to use partition helper for Mongo for consistency across sources? also thinking if we ever want to partition on other fields other than _id (i.e. with flattened mode in the future), but not a big deal, we can add later too.

partitionHelper can be used; it's just a bit cludgy to use it here (bson.ObjectID is not the type stored in protobufs and the helper tends to use types to figure out the type of range, which is string)

It doesn't give us any advantage here since we don't coalesce partitions together based on ObjectID range (could be done down the line)

jgao54

Looks good overall, few nits

heavycrystal requested review from serprex, jgao54 and Copilot July 30, 2025 19:45

Copilot AI reviewed Jul 30, 2025

View reviewed changes

heavycrystal force-pushed the mongo-parallel-snapshot branch from c7c9c60 to 1916bc8 Compare July 31, 2025 15:59

heavycrystal marked this pull request as ready for review August 1, 2025 17:16

jgao54 reviewed Aug 5, 2025

View reviewed changes

jgao54 approved these changes Aug 5, 2025

View reviewed changes

heavycrystal added 5 commits August 6, 2025 00:37

[mongo] parallel snapshot for ObjectID key

0b7b4ad

oopsies pt.1

da723fc

move from partitionHelper, e2e

f8f9bf7

fix lint, copilot feedback

0e732ca

fix test

33a9a53

heavycrystal force-pushed the mongo-parallel-snapshot branch from b62157a to 33a9a53 Compare August 5, 2025 19:11

heavycrystal added 2 commits August 6, 2025 00:48

oops

28784c6

switch to estimatedDocumentCount

d423a2f

heavycrystal merged commit 2158aaa into main Aug 6, 2025
13 of 17 checks passed

heavycrystal deleted the mongo-parallel-snapshot branch August 6, 2025 19:02

heavycrystal mentioned this pull request Aug 7, 2025

mongo: parallel Initial load #3136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mongo] parallel snapshot for ObjectID key #3293

[mongo] parallel snapshot for ObjectID key #3293

Uh oh!

heavycrystal commented Jul 30, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgao54 Aug 5, 2025

Uh oh!

heavycrystal Aug 5, 2025

Uh oh!

jgao54 Aug 5, 2025

Uh oh!

heavycrystal Aug 5, 2025

Uh oh!

jgao54 left a comment

Uh oh!

Uh oh!

Uh oh!

	// we can compare ObjectIDs, but not sure if doing this is correct
	// Comparing ObjectIDs using bytes.Compare is valid because ObjectIDs are 12-byte values
	// that can be compared lexicographically to determine their order.

[mongo] parallel snapshot for ObjectID key #3293

[mongo] parallel snapshot for ObjectID key #3293

Uh oh!

Conversation

heavycrystal commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgao54 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

heavycrystal Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

jgao54 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

heavycrystal Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

jgao54 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

heavycrystal commented Jul 30, 2025 •

edited

Loading