-
Notifications
You must be signed in to change notification settings - Fork 479
Oplog Progress File
The oplog progress file (called "oplog.timestamp" by default, or "config.txt" in versions prior to 2.0) keeps track of the latest oplog entry seen for each replica set to which Mongo Connector is connected. Mongo Connector uses this file to decide, for each replica set to which it is connected, where to begin reading the oplog on startup. Note that Mongo Connector will continue normal operation even if the file becomes deleted or corrupt while running.
The format for oplog progress files was changed between versions 1.3 and 1.3.1 for connections to sharded clusters. This was to fix a bug where Mongo Connector was unable to parse the progress file, and thus would raise an Exception instead of beginning to tail the oplog at the proper place. This change does not affect users who have only run the connector against replica sets. This change does not impact any replicated data. Users who run against sharded clusters will need to allow Mongo Connector to create a new oplog progress file by following the steps in the Creating an Oplog Progress File section.
When the oplog progress file cannot be found, or if it is empty, Mongo Connector will begin pulling data from all MongoDB collections (or the ones given in --namespace-set
) in the "collection dump" phase. The oplog progress file is then updated with the most recent timestamp from before the dump happened. Mongo Connector then applies all oplog operations from before the dump, so that the copied documents will be up-to-date with what's on MongoDB.
We can force a collection dump to happen, therefore, by specifying an empty or non-existent file with the --oplog-ts
option. You may want to re-sync if Mongo Connector falls behind the last record in the oplog. This may happen during a very high write-load or after having stopped Mongo Connector for a long time.
The exact format of this file depends on MongoDB's toplogy. For a single replica set, the format is:
["oplog name", timestamp]
For a sharded cluster, there is one such entry for each replica set shard:
[["oplog 1 name", timestamp 1], ["oplog 2 name", timestamp 2], ...]
The oplog progress file is created as the final step of Mongo Connector's initialization and happens with or without a collection dump. The main thread of the connector monitors the progress of each oplog-tailing thread, updating the progress file once per second. Oplog-tailing threads publish their progress at the following times:
- After every
--batch-size
oplog records processed - After processing all available oplog records
- When an oplog-tailing thread's connection to MongoDB is interrupted
- Immediately upon startup (progress is reported as most recent oplog record)
- Immediately after a rollback
Note: Each time before the main thread writes to the progress file, it creates a backup copy of the progress file with the same name with ".backup" appended to it.
Creating an oplog progress file starting at the most recent oplog record is can be useful if your previous progress file is accidentally deleted or somehow becomes corrupted, or if you are affected by the change in oplog progress file format when connected to a sharded cluster. You should only do this if you're confident that Mongo Connector has successfully replicated all operations up to that point, otherwise you should re-sync the connector by deleting the file and restarting mongo-connector
. You can force Mongo Connector to create an oplog progress file containing the most recent oplog record using the following method:
- Stop Mongo Connector, if it is running.
- Start Mongo Connector again with:
-
--oplog-ts
pointing to an empty or non-existent file -
--no-dump
so that Mongo Connector will not attempt to copy data.
-
- Stop Mongo Connector.
- Restart Mongo Connector with your usual options, and make sure to point
--oplog-ts
at the new progress file.