Skip to content

Add new 'simple' oplog tailer method #301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 15, 2019

Conversation

dschneller
Copy link
Contributor

Adds a new "simple" method for collecting oplogs needed to construct
a consistent backup. It implements an algorithm similar to what mongodump
--oplog already does, albeit for multiple shards.

It does not begin tailing the oplogs for all shards at the beginning
of the backup. Instead, it runs mongodump for all shards and waits until
they have all finished.

Then it collects the delta between when each shard's dump ended and the time
when the last one finished.

The following stages, especially the Resolver, which brings all shard's
oplogs forward to a consistent point in time, are unchanged.

Rationale for this addition:

With the existing "tailer" approach, our backups very often failed with the
error message "Tailer host changed". This appears to be a common problem
with oplog tailing in general, judging from what you can find on the internet.
It appears that for some reason the oplog tailing cursors get aborted by
mongod with an error stating "operation exceeded time limit", code: 50.

With this new simpler oplog fetching method, that apparently does not happen.

The most important difference/drawback compared to the current tailer is that
the simple approach fails if the oplog of one of the shards is so busy that
by the time the deltas are to be collected it has rolled over, so that
operations are no longer available. This, however, will only be the case
on very busy systems where one might argue the oplog size should be increased
anyway.

In general the simple method should be a little less resource intensive, because
there is not additional I/O while mongodumps are runnig.

This change is backwards compatbile for callers. To use the new method, a new
configuration parameter needs to be specified: --oplog.tailer.method simple.
The default value for this option is tailer which can also be explictly set
to select the classic implementation.

Implementation Notes:

  • Common functionality between the original Tailer and the new simple
    implementation was extracted into a new common base class "OplogTask".
  • In a few places some variables were extracted or renamed to (hopefully)
    make the code a little more readable, despite the additions.
  • In the Resolver class the thread pool's join() method is called to fix
    spurious (harmless) error messages like the following when finishing:
Process PoolWorker-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 380, in get
    rrelease()

Adds a new "simple" method for collecting oplogs needed to construct
a consistent backup. It implements an algorithm similar to what mongodump
--oplog already does, albeit for multiple shards.

It does _not_ begin tailing the oplogs for all shards at the beginning
of the backup. Instead, it runs mongodump for all shards and waits until
they have all finished.

Then it collects the delta between when each shard's dump ended and the time
when the last one finished.

The following stages, especially the Resolver, which brings all shard's
oplogs forward to a consistent point in time, are unchanged.

Rationale for this addition:

With the existing "tailer" approach, our backups very often failed with the
error message "Tailer host changed".  This appears to be a common problem
with oplog tailing in general, judging from what you can find on the internet.
It appears that for some reason the oplog tailing cursors get aborted by
mongod with an error stating `"operation exceeded time limit", code: 50`.

With this new simpler oplog fetching method, that apparently does not happen.

The most important difference/drawback compared to the current tailer is that
the simple approach fails if the oplog of one of the shards is so busy that
by the time the deltas are to be collected it has rolled over, so that
operations are no longer available. This, however, will only be the case
on very busy systems where one might argue the oplog size should be increased
anyway.

In general the simple method should be a little less resource intensive, because
there is not additional I/O while mongodumps are runnig.

This change is backwards compatbile for callers. To use the new method, a new
configuration parameter needs to be specified: `--oplog.tailer.method simple`.
The default value for this option is `tailer` which can also be explictly set
to select the classic implementation.

Implementation Notes:
* Common functionality between the original Tailer and the new simple
  implementation was extracted into a new common base class "OplogTask".
* In a few places some variables were extracted or renamed to (hopefully)
  make the code a little more readable, despite the additions.
* In the Resolver class the thread pool's join() method is called to fix
  spurious (harmless) error messages like the following when finishing:

```
Process PoolWorker-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 102, in worker
    task = get()
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 380, in get
    rrelease()
```
Was commented out for debugging and accidentally forgotten.
If there were no oplog changes to be resolved, an exception
would be thrown; in addition, the logging was broken, because
it tried to access the `uri` variable before it was initialized.

This commit fixes both issues.
@dschneller
Copy link
Contributor Author

Just adding this as a datapoint: The feature has been in production use for several weeks without showing any problems.

Anything else I should do to facilitate merging?

Copy link
Contributor

@timvaillancourt timvaillancourt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @dschneller!

@timvaillancourt timvaillancourt merged commit e630e3a into Percona-Lab:master Feb 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants