-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix database purging of stale activewatchers #2519
base: master
Are you sure you want to change the base?
Fix database purging of stale activewatchers #2519
Conversation
acd309a
to
addbbc9
Compare
…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And it's not a problem if the perioding cleanup had cleaned those records already.)
…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And the periodic cleanup had cleaned those records already, it would not be an issue.)
…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And if the periodic cleanup had cleaned those records already, it would not be an issue.)
(This one might be for you @bogdan-iancu 😉 . See also PR #2520.) |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
If you try to (mysql) prepare a statement with "... column IS ?" you'd get an error. For mysql, you could fix this by using the "... column <=> ?" special operator ('IS NOT DISTINCT FROM'), but that's not standard SQL. Instead, you can now do: update_cols[0] = &some_column; update_ops[0] = OP_IS_NULL; update_vals[0].nul = 1; You still need to set .nul=1 because prepared statements have to consume a single value.
If you're using clustering, without tags (or with tags, but without fallback2db), you would end up with a lot of active watchers in the database that never got cleaned up. Before 929ab4d: - all stale activewatcher records in the db got purged. After: - if you're not clustering, all stale active watcher got purged; - if you are, only those with the sharing_tag set. However, the sharing tag is not necessarily set: - it is an optional argument to handle_subscribe; - and setting it in handle_subscribe does not write to the database because of missing code in the fallback2db==0 bit; - and even it it were set, then adding the argument to handle_subscribe does not "activate" the sharing tag. (Also interesting to know: a 408 or 481 after the this-subscription-is-expired NOTIFY _would_ cause the individual record to get deleted. But any other response, including 200, would leave the record to get sorted by the periodic purge.) This changeset reverts parts of the aforementioned commit by always purging stale records if the sharing_tag is NULL. Thus restoring behaviour to pre-3.1.0 and pre-2.4.8.
addbbc9
to
a6a22c9
Compare
Rebased so the conflicts with 87f2416 are gone 🎉 |
…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And if the periodic cleanup had cleaned those records already, it would not be an issue.)
Any reason not to merge this request? I'm asking because our cluster is experiencing this issue, resulting in a lot of database stress and maintenance to prevent further issues. |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
OK, as a start, we need to evaluate this "is NULL" support ...or to see if there is no way around it... Maybe changing the DB schema, rather that complicating the code... |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
This changeset contains two commits.
(1) To enable IS NULL support (it's ugly).
(2) Fix so activewatcher.sharing_tag IS NULL records get pruned, which was broken since 929ab4d (or 6022968 for 2.4.x).
Without this changeset, people using clustering but no sharing tags will get stale records in the database.
See the commit messages for details.