Skip to content

PoolСluster auto-restore node #906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ npm-debug.log
/test/config.js

*.sublime-*
.idea
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this; really, editor-specific entries belong in your global ignore list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this line, and sublime-specific things too 😃 OK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went ahead and cleaned up the file on master :)


.DS_Store
2 changes: 2 additions & 0 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,6 +406,8 @@ poolCluster.end();
* `canRetry`: If `true`, `PoolCluster` will attempt to reconnect when connection fails. (Default: `true`)
* `removeNodeErrorCount`: If connection fails, node's `errorCount` increases.
When `errorCount` is greater than `removeNodeErrorCount`, remove a node in the `PoolCluster`. (Default: `5`)
* `restoreNodeTimeout`: If value is greater than `0`, node will not be actually removed.
After `restoreNodeTimeout` (ms) node will be automatically restored in `PoolCluster` with `errorCount = 0`. (Default: `0`)
* `defaultSelector`: The default selector. (Default: `RR`)
* `RR`: Select one alternately. (Round-Robin)
* `RANDOM`: Select the node by random function.
Expand Down
34 changes: 28 additions & 6 deletions lib/PoolCluster.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ function PoolCluster(config) {
config = config || {};
this._canRetry = typeof config.canRetry === 'undefined' ? true : config.canRetry;
this._removeNodeErrorCount = config.removeNodeErrorCount || 5;
this._restoreNodeTimeout = config.restoreNodeTimeout || 0;
this._defaultSelector = config.defaultSelector || 'RR';

this._closed = false;
Expand Down Expand Up @@ -54,6 +55,7 @@ PoolCluster.prototype.add = function(id, config) {
this._nodes[id] = {
id: id,
errorCount: 0,
online: true,
pool: new Pool({config: new PoolConfig(config)})
};

Expand Down Expand Up @@ -88,7 +90,9 @@ PoolCluster.prototype.end = function() {
this._closed = true;

for (var id in this._nodes) {
this._nodes[id].pool.end();
var node = this._nodes[id];
if (node.restoreTimeoutID) clearTimeout(node.restoreTimeoutID);
node.pool.end();
}
};

Expand Down Expand Up @@ -124,17 +128,35 @@ PoolCluster.prototype._getNode = function(id) {
};

PoolCluster.prototype._increaseErrorCount = function(node) {
var self = this;

if (++node.errorCount >= this._removeNodeErrorCount) {
var index = this._serviceableNodeIds.indexOf(node.id);
if (index !== -1) {
this._serviceableNodeIds.splice(index, 1);
delete this._nodes[node.id];

this._clearFindCaches();

node.pool.end();
if ((this._restoreNodeTimeout > 0) && !node.restoreTimeoutID) {
node.online = false;
this.emit('offline', node.id);

node.restoreTimeoutID = setTimeout(function() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to do this using a timer? Why can we not calculate the restoration time and in functions that get nodes, simply query that? (and if online is important, it can be a getter). We may also want to use process.hrtime instead of dates and timers such that we are using a monotonically-increasing source that is not subject to system date/time adjustments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi-res time is good idea.
Then, we should create another queue _offlineNodeIds (like _serviceableNodeIds). Every time we call _getClusterNode, it will check first record of _offlineNodeIds (id+hrtime) and restore offline node if it is needed. Then restored node should be moved from _offlineNodeIds to _serviceableNodeIds.

Do you approve this idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

delete node.restoreTimeoutID;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set back to 0 or something rather than deleting and changing the hidden class of the node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I'll fix it

node.online = true;
node.errorCount = 0;

self._serviceableNodeIds.push(node.id);
self._clearFindCaches();

self.emit('online', node.id);
}, this._restoreNodeTimeout);
} else {
delete this._nodes[node.id];

node.pool.end();

this.emit('remove', node.id);
this.emit('remove', node.id);
}
}
}
};
Expand Down Expand Up @@ -182,7 +204,7 @@ PoolNamespace.prototype.getConnection = function(cb) {
var namespace = this;

if (clusterNode === null) {
var err = new Error('Pool does not exist.')
var err = new Error('Pool does not exist.');
err.code = 'POOL_NOEXIST';
return cb(err);
}
Expand Down