Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messaging: Avoid deadlocks related to 0 receiver behavior #10132

Merged
merged 16 commits into from
Apr 28, 2022

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Apr 24, 2022

Description

There's conditional behavior and code paths in the message manager when the number of receivers is 0. The main mutex and the stream mutex locks are used to ensure overall consistency. But:

  1. The order that those locks are taken in was not enforced or consistent
  2. There was a gap in the poller between when the 0 receiver code path was checked — specifically that it was NOT 0 — and the main mutex lock was taken to ensure that this invariant holds during the polling run here:
    // Fast-path. Skip all the work.
    if mm.receiverCount() == 0 {
    return
    }
    mm.streamMu.Lock()
    defer mm.streamMu.Unlock()
    ctx, cancel := context.WithTimeout(tabletenv.LocalContext(), mm.pollerTicks.Interval())
    defer func() {
    mm.tsv.LogError()
    cancel()
    }()
    size := mm.cache.Size()
    bindVars := map[string]*querypb.BindVariable{
    "time_next": sqltypes.Int64BindVariable(time.Now().UnixNano()),
    "max": sqltypes.Int64BindVariable(int64(size)),
    }
    qr, err := mm.readPending(ctx, bindVars)
    if err != nil {
    return
    }
    // Obtain mu lock to verify and preserve that len(receivers) != 0.
    mm.mu.Lock()
    defer mm.mu.Unlock()
    mm.messagesPending = false
    if len(qr.Rows) >= size {
    // There are probably more messages to be sent.
    mm.messagesPending = true
    }
    if len(mm.receivers) == 0 {
    // Almost never reachable because we just checked this.
    return
    }

These two factors can lead to deadlocks when the receiver count goes to/from 0 while the the poller runs:

In unsubscribe() we have the main mutex, and while holding it we then call stopVStream() if the subscriber/receiver count is at 0. Then stopVStream() tries to get the stream mutex.

Concurrently, runPoller() — which by default, or more accurately when using the documented message table structure here via the vt_poller_interval comment directive, runs on a 30 second ticker — gets the stream mutex first, then it later tries to get the main mutex.

Now we are deadlocked. They're both waiting on each other.

This then in turn causes things like PlannedReparentShard to hang because during the transition from primary serving we stop/close the message service, which in turn will wait trying to get the main mutex.

Changes

  1. We clean up the mutex usage. In order to ensure that the cache stays in sync with the database and we are moving forward linearly though the logical GTID stream — updated by the poller and its results streamer, and the binlog streamer — we now use the cacheManagementMu lock. The explicit usage of this new mutex allows us to avoid the deadlocks seen before because it's taken immediately when the poller or binlog streamer process data coming from the database (which is used to update the cache), and we no longer need to take this lock when starting or stopping the binlog streamer itself.
    • After the initial changes made in the PR I could see that we were still prone to deadlocks in various places due to checking the receiver count all the time (I could see this in the race test stack traces, see Notes below), which required the main mutex, while holding the stream mutex. Anywhere we were holding the stream mutex, and checking receiver count — which we did A LOT, for example when processing every row event from the binlog stream via processRowEvent()->Add() — we were at risk of a deadlock. The main mutex became VERY hot because we were checking the receiver count all the time to ensure the fast path (do nothing when there's no receivers).
  2. We add an important predicate to the poller querytime_acked is null — to try and limit the size of scans done on the underlying message table to try and ensure that the poller runs quickly. This aspect is important because the binlog event processing is blocked during this time. It also prevents us from doing unnecessary work as InnoDB does most of the work — all query (WHERE) predicates become index condition pushdowns and the new poller_idx index can be used for the ordering to apply the LIMIT as well — and passes the minimum amount of potentially useful data up the stack InnoDB->MySQL->MessageManager->Cache.
  3. We update the tests so that we're using the new recommended/documented (see docs ticket below) message table structure and thus we're testing behavior with the structure we tell users to use.

Notes

This is able to complete on vitessio/main:

$ make build; failed=0; for i in {1..100}; do if ! go test -race -v -failfast -timeout 1m vitess.io/vitess/go/vt/vttablet/tabletserver/messager; then echo -e '\nUh oh! We have a problem...\n'; failed=1; break; fi; go clean -testcache; done; if [[ ${failed} -eq 0 ]]; then echo -e '\nWe made it! All good...\n'; fi

Whatever changes are made here we should continue to pass that test (it does). It's even more reliable now — you can run it 1,000 times and no flakiness.

Related Issue(s)

Checklist

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord marked this pull request as draft April 25, 2022 00:57
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord mattlord marked this pull request as ready for review April 25, 2022 18:03
@mattlord mattlord changed the title Prevent deadlocks related to 0 receiver behavior Avoid deadlocks related to 0 receiver behavior Apr 25, 2022
Signed-off-by: Matt Lord <mattalord@gmail.com>
@derekperkins
Copy link
Member

Looks like the majority of tests are failing on

--- FAIL: TestMessageStreamingPlan (0.00s)
    plan_test.go:239: 
        	Error Trace:	plan_test.go:239
        	Error:      	Received unexpected error:
        	            	Code: FAILED_PRECONDITION
        	            	'msg' is not a message table
        	Test:       	TestMessageStreamingPlan

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
This is for safe concurrency with the last receiver unsubscribing

Signed-off-by: Matt Lord <mattalord@gmail.com>
@derekperkins
Copy link
Member

@mattlord you really rolled through quite a few iterations, thanks again for digging in so deep. I don't think I have enough context around vstream to provide much feedback on the latest design. Once @rohit-nayak-ps approves it, I'm ready to roll it out in prod to test it out.

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
@mattlord
Copy link
Contributor Author

mattlord commented Apr 27, 2022

Note that the Cluster (20) -> backup_xtrabackup test failure is due to: #10146

@derekperkins
Copy link
Member

@mattlord is this still waiting on @rohit-nayak-ps, or was pairing with @sougou sufficient review?

@mattlord mattlord merged commit 870f13e into vitessio:main Apr 28, 2022
@mattlord mattlord deleted the messaging_deadlock branch April 28, 2022 18:25
@derekperkins
Copy link
Member

🎉 Thanks so much for tracking this down!! I'll be rolling it out later today

@mattlord
Copy link
Contributor Author

🎉 Thanks so much for tracking this down!! I'll be rolling it out later today

Excellent! Please let me know how things go.

tanjinx added a commit to slackhq/vitess that referenced this pull request Aug 30, 2022
@tanjinx tanjinx mentioned this pull request Aug 30, 2022
3 tasks
@mattlord mattlord changed the title Avoid deadlocks related to 0 receiver behavior Messaging: Avoid deadlocks related to 0 receiver behavior Aug 31, 2022
tanjinx added a commit to slackhq/vitess that referenced this pull request Aug 31, 2022
rsajwani pushed a commit to planetscale/vitess that referenced this pull request Nov 8, 2022
* Prevent deadlocks related to 0 receiver behavior

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update test tables to use poller_idx

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Minor changes after mutex usage review in message manager + cache

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Use atomics for receiver count and messages pending

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Don't take exclusive lock when in fast path

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update tests to use the new recommended message table structure

See: vitessio/website#1015

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Correct tests

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update e2e test to use new recommended table structure

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Fix TestMessageStreamingPlan test

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Fix godriver/TestStreamMessaging test

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Split streamMu into streamProcessingMu and lastPollPositionMu

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Poller cannot take main lock w/o having X stream processing lock

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Improve the comments a bit

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Hold the main mutex during Add

This is for safe concurrency with the last receiver unsubscribing

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Changes after pair reviewing with Sugu

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Use my GitHub handle for the self reference

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>
harshit-gangal pushed a commit that referenced this pull request Nov 22, 2022
…11619)

* Move towards MySQL 8.0 as the default template generation (#11153)

* Move towards MySQL 8.0 as the default template generation

This upgrades the remaining things to Ubuntu 20.04 and makes MySQL 8.0
the default we run tests against. We still have tests for MySQL 5.7 but
those are now explicitly opted into.

This should finish up the Ubuntu 20.04 upgrade and also makes things
easier for the future when we need to upgrade again.

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>

* CI: rename shard vtorc_8.0 to vtorc_5.7, change expected test output for 8.0

Signed-off-by: deepthi <deepthi@planetscale.com>

* CI: increase timeout for 8.0 tests on the actual test step from 30 to 45 mins

Signed-off-by: deepthi <deepthi@planetscale.com>

* CI: increase timeout to 45 minutes for mysql57 tests too. We really only need this for vtorc, but I've made the change to the template so all tests get it.

Signed-off-by: deepthi <deepthi@planetscale.com>

* CI: fix vtorc test to work with both 5.7 and 8.0

Signed-off-by: deepthi <deepthi@planetscale.com>

* CI: missed docker flag in mysql57 template, one more fix to vtorc test

Signed-off-by: deepthi <deepthi@planetscale.com>

* removing spaces from pb file

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* removing spaces in pb file part 2

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>
Co-authored-by: deepthi <deepthi@planetscale.com>
Co-authored-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* fixing template files to default to mysql80

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* changing update statement to alter

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* Avoid deadlocks related to 0 receiver behavior (#10132)

* Prevent deadlocks related to 0 receiver behavior

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update test tables to use poller_idx

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Minor changes after mutex usage review in message manager + cache

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Use atomics for receiver count and messages pending

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Don't take exclusive lock when in fast path

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update tests to use the new recommended message table structure

See: vitessio/website#1015

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Correct tests

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Update e2e test to use new recommended table structure

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Fix TestMessageStreamingPlan test

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Fix godriver/TestStreamMessaging test

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Split streamMu into streamProcessingMu and lastPollPositionMu

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Poller cannot take main lock w/o having X stream processing lock

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Improve the comments a bit

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Hold the main mutex during Add

This is for safe concurrency with the last receiver unsubscribing

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Changes after pair reviewing with Sugu

Signed-off-by: Matt Lord <mattalord@gmail.com>

* Use my GitHub handle for the self reference

Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* remove unwanted sleep

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* Fix failures

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* Fix backup tests

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* fix vtgate_gen4 error

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* upgrading to mysql 8.0 using vinalla mysql

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* pin to specific version 8.0.25

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* Try to pin mysql version to 8.0.25 using tar file

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* Fix bug in last commit for 8.0.25

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* fixing template files to use mysql8.0.25

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* fixing community version

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* removing all mysql version before installation

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* moving cluster 11 to self host

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* trying different combination since mysql is not getting installed correctly

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* use tar instead of deb file

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* remove mysql stop statement

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* setting vt_mysql_root

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* moving export to right place

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* change template to accomodate tar file download

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* fixing mysql80 cluster

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* Adding mysql to the path env variable

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* fixing shardedpitr_tls test

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* fix upgrade-downgrade test

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* adjust epected AUTO_INCREMENT value

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* removed 'expected_table_structure' files because there are different outputs in mysql57 and in mysql80

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>

* adding mysql version in workflow logs

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

* move to utuntu 18 for upgrade-downgrade test

Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>

Signed-off-by: Dirkjan Bussink <d.bussink@gmail.com>
Signed-off-by: deepthi <deepthi@planetscale.com>
Signed-off-by: Rameez Sajwani <rameezwazirali@hotmail.com>
Signed-off-by: Matt Lord <mattalord@gmail.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Co-authored-by: Dirkjan Bussink <d.bussink@gmail.com>
Co-authored-by: deepthi <deepthi@planetscale.com>
Co-authored-by: Matt Lord <mattalord@gmail.com>
Co-authored-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug Report: vttablet hanging when running PlannedReparentShard
4 participants