Add new 'simple' oplog tailer method #301

dschneller · 2018-12-17T14:17:29Z

Adds a new "simple" method for collecting oplogs needed to construct
a consistent backup. It implements an algorithm similar to what mongodump
--oplog already does, albeit for multiple shards.

It does not begin tailing the oplogs for all shards at the beginning
of the backup. Instead, it runs mongodump for all shards and waits until
they have all finished.

Then it collects the delta between when each shard's dump ended and the time
when the last one finished.

The following stages, especially the Resolver, which brings all shard's
oplogs forward to a consistent point in time, are unchanged.

Rationale for this addition:

With the existing "tailer" approach, our backups very often failed with the
error message "Tailer host changed". This appears to be a common problem
with oplog tailing in general, judging from what you can find on the internet.
It appears that for some reason the oplog tailing cursors get aborted by
mongod with an error stating "operation exceeded time limit", code: 50.

With this new simpler oplog fetching method, that apparently does not happen.

The most important difference/drawback compared to the current tailer is that
the simple approach fails if the oplog of one of the shards is so busy that
by the time the deltas are to be collected it has rolled over, so that
operations are no longer available. This, however, will only be the case
on very busy systems where one might argue the oplog size should be increased
anyway.

In general the simple method should be a little less resource intensive, because
there is not additional I/O while mongodumps are runnig.

This change is backwards compatbile for callers. To use the new method, a new
configuration parameter needs to be specified: --oplog.tailer.method simple.
The default value for this option is tailer which can also be explictly set
to select the classic implementation.

Implementation Notes:

Common functionality between the original Tailer and the new simple
implementation was extracted into a new common base class "OplogTask".
In a few places some variables were extracted or renamed to (hopefully)
make the code a little more readable, despite the additions.
In the Resolver class the thread pool's join() method is called to fix
spurious (harmless) error messages like the following when finishing:

Process PoolWorker-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 380, in get
    rrelease()

Adds a new "simple" method for collecting oplogs needed to construct a consistent backup. It implements an algorithm similar to what mongodump --oplog already does, albeit for multiple shards. It does _not_ begin tailing the oplogs for all shards at the beginning of the backup. Instead, it runs mongodump for all shards and waits until they have all finished. Then it collects the delta between when each shard's dump ended and the time when the last one finished. The following stages, especially the Resolver, which brings all shard's oplogs forward to a consistent point in time, are unchanged. Rationale for this addition: With the existing "tailer" approach, our backups very often failed with the error message "Tailer host changed". This appears to be a common problem with oplog tailing in general, judging from what you can find on the internet. It appears that for some reason the oplog tailing cursors get aborted by mongod with an error stating `"operation exceeded time limit", code: 50`. With this new simpler oplog fetching method, that apparently does not happen. The most important difference/drawback compared to the current tailer is that the simple approach fails if the oplog of one of the shards is so busy that by the time the deltas are to be collected it has rolled over, so that operations are no longer available. This, however, will only be the case on very busy systems where one might argue the oplog size should be increased anyway. In general the simple method should be a little less resource intensive, because there is not additional I/O while mongodumps are runnig. This change is backwards compatbile for callers. To use the new method, a new configuration parameter needs to be specified: `--oplog.tailer.method simple`. The default value for this option is `tailer` which can also be explictly set to select the classic implementation. Implementation Notes: * Common functionality between the original Tailer and the new simple implementation was extracted into a new common base class "OplogTask". * In a few places some variables were extracted or renamed to (hopefully) make the code a little more readable, despite the additions. * In the Resolver class the thread pool's join() method is called to fix spurious (harmless) error messages like the following when finishing: ``` Process PoolWorker-8: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker task = get() File "/usr/lib/python2.7/multiprocessing/queues.py", line 380, in get rrelease() ```

Was commented out for debugging and accidentally forgotten.

If there were no oplog changes to be resolved, an exception would be thrown; in addition, the logging was broken, because it tried to access the `uri` variable before it was initialized. This commit fixes both issues.

dschneller · 2019-02-11T11:45:38Z

Just adding this as a datapoint: The feature has been in production use for several weeks without showing any problems.

Anything else I should do to facilitate merging?

timvaillancourt

LGTM thanks @dschneller!

dschneller added 4 commits December 17, 2018 14:38

Re-activate commented out exception handler

5679383

Was commented out for debugging and accidentally forgotten.

Test new simple oplog method in Travis

0f077f1

Fix case when there are no oplog changes

8156579

If there were no oplog changes to be resolved, an exception would be thrown; in addition, the logging was broken, because it tried to access the `uri` variable before it was initialized. This commit fixes both issues.

timvaillancourt approved these changes Feb 15, 2019

View reviewed changes

timvaillancourt merged commit e630e3a into Percona-Lab:master Feb 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new 'simple' oplog tailer method #301

Add new 'simple' oplog tailer method #301

Uh oh!

dschneller commented Dec 17, 2018

Uh oh!

dschneller commented Feb 11, 2019

Uh oh!

timvaillancourt left a comment

Uh oh!

Uh oh!

Add new 'simple' oplog tailer method #301

Add new 'simple' oplog tailer method #301

Uh oh!

Conversation

dschneller commented Dec 17, 2018

Uh oh!

dschneller commented Feb 11, 2019

Uh oh!

timvaillancourt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!