Fix database purging of stale activewatchers #2519

wdoekes · 2021-05-11T10:28:38Z

This changeset contains two commits.

(1) To enable IS NULL support (it's ugly).

(2) Fix so activewatcher.sharing_tag IS NULL records get pruned, which was broken since 929ab4d (or 6022968 for 2.4.x).

Without this changeset, people using clustering but no sharing tags will get stale records in the database.

See the commit messages for details.

…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And it's not a problem if the perioding cleanup had cleaned those records already.)

…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And the periodic cleanup had cleaned those records already, it would not be an issue.)

…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And if the periodic cleanup had cleaned those records already, it would not be an issue.)

wdoekes · 2021-05-12T09:05:30Z

(This one might be for you @bogdan-iancu 😉 . See also PR #2520.)

stale · 2021-07-21T14:26:07Z

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

If you try to (mysql) prepare a statement with "... column IS ?" you'd get an error. For mysql, you could fix this by using the "... column <=> ?" special operator ('IS NOT DISTINCT FROM'), but that's not standard SQL. Instead, you can now do: update_cols[0] = &some_column; update_ops[0] = OP_IS_NULL; update_vals[0].nul = 1; You still need to set .nul=1 because prepared statements have to consume a single value.

If you're using clustering, without tags (or with tags, but without fallback2db), you would end up with a lot of active watchers in the database that never got cleaned up. Before 929ab4d: - all stale activewatcher records in the db got purged. After: - if you're not clustering, all stale active watcher got purged; - if you are, only those with the sharing_tag set. However, the sharing tag is not necessarily set: - it is an optional argument to handle_subscribe; - and setting it in handle_subscribe does not write to the database because of missing code in the fallback2db==0 bit; - and even it it were set, then adding the argument to handle_subscribe does not "activate" the sharing tag. (Also interesting to know: a 408 or 481 after the this-subscription-is-expired NOTIFY _would_ cause the individual record to get deleted. But any other response, including 200, would leave the record to get sorted by the periodic purge.) This changeset reverts parts of the aforementioned commit by always purging stale records if the sharing_tag is NULL. Thus restoring behaviour to pre-3.1.0 and pre-2.4.8.

wdoekes · 2021-07-26T12:45:42Z

Rebased so the conflicts with 87f2416 are gone 🎉

…ck2db) When clustering sharing_tags were added to presence, they were added to the fallback2db "on" case only: There are a couple of dimensions with differing behaviours: +------------+------------+ | fallback2- | fallback2- | | -db = on | -db = off | +-clustering:-+------------+------------+ | - no | OK | OK | | - tagless | PR-2519 | PR-2519 | | - active | OK | this | +-------------+------------+------------+ The non-OK behaviour above refers to the activewatcher table getting filled up with stale/expired items. fallback2db on or off: ``` modparam("presence", "fallback2db", 0) # or 1=on ``` The no-clustering case: ``` handle_subscribe(); ``` The tagless case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) handle_subscribe(); ``` The active case: ``` modparam("presence", "cluster_id", 1) modparam("clusterer", "my_node_id", 2) modparam("clusterer", "sharing_tag", "node2/1=active") handle_subscribe("0", "node2"); ``` Where PR OpenSIPS#2519 fixes the tagless case, this PR fixes the fallback2db=0 case by writing the sharing_tag to the database so the records can get found and cleaned up. (Sidenote: subscriptions which ended with a timeout or 481 *would* get cleaned up. This makes sense in all cases: if they have an error before their expiry, it makes sense to purge them from the DB immediately. And if the periodic cleanup had cleaned those records already, it would not be an issue.)

hafkensite · 2021-08-10T11:27:47Z

Any reason not to merge this request?

I'm asking because our cluster is experiencing this issue, resulting in a lot of database stress and maintenance to prevent further issues.

stale · 2022-01-09T03:07:37Z

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

bogdan-iancu · 2022-01-10T16:09:38Z

OK, as a start, we need to evaluate this "is NULL" support ...or to see if there is no way around it... Maybe changing the DB schema, rather that complicating the code...

stale · 2022-04-16T05:31:05Z

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

wdoekes force-pushed the fix-database-purging-of-stale-activewatchers branch from acd309a to addbbc9 Compare May 11, 2021 11:31

wdoekes mentioned this pull request May 11, 2021

presence: Fix database purge of activewatchers (clustering, fallback2db=0) #2520

Open

bogdan-iancu self-assigned this Jul 6, 2021

stale bot added the stale label Jul 21, 2021

bogdan-iancu added investigating and removed stale labels Jul 21, 2021

wdoekes added 2 commits July 26, 2021 14:38

wdoekes force-pushed the fix-database-purging-of-stale-activewatchers branch from addbbc9 to a6a22c9 Compare July 26, 2021 12:43

stale bot added the stale label Jan 9, 2022

stale bot removed the stale label Jan 10, 2022

stale bot added the stale label Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix database purging of stale activewatchers #2519

Fix database purging of stale activewatchers #2519

wdoekes commented May 11, 2021

wdoekes commented May 12, 2021

stale bot commented Jul 21, 2021

wdoekes commented Jul 26, 2021

hafkensite commented Aug 10, 2021

stale bot commented Jan 9, 2022

bogdan-iancu commented Jan 10, 2022

stale bot commented Apr 16, 2022

Fix database purging of stale activewatchers #2519

Are you sure you want to change the base?

Fix database purging of stale activewatchers #2519

Conversation

wdoekes commented May 11, 2021

wdoekes commented May 12, 2021

stale bot commented Jul 21, 2021

wdoekes commented Jul 26, 2021

hafkensite commented Aug 10, 2021

stale bot commented Jan 9, 2022

bogdan-iancu commented Jan 10, 2022

stale bot commented Apr 16, 2022