-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STORAGE-4262: Automatically Recover From Failed PRS Operation #5
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
It fails after rebootstrapping docker image Signed-off-by: Morgan Tocker <tocker@gmail.com>
Disable prepared statements test
Signed-off-by: Morgan Tocker <tocker@gmail.com>
Signed-off-by: Harshit Gangal <harshit.gangal@gmail.com>
[JAVA] Vitess JDBC release 4.0
Signed-off-by: Morgan Tocker <tocker@gmail.com>
…t-protection Back port stronger root protection
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
Signed-off-by: Adam Saponara <as@php.net>
…dpoint (#3) `/debug/liveness` endpoint now returns a 503 when `/etc/etsy/depool` is present on the filesystem note: included "unit" test currently relies on `/etc/etsy` existing and being writable. run with `go test go/vt/servenv/*.go` Signed-off-by: Mackenzie Starr <mstarr@etsy.com>
… pool timeout to Vitess 4.x (#4) this should prevent downstream clients from queueing indefinitely to acquire a connection from the stream pool, which we have seen exhausts downstream httpd workers in production when clients hit the stream pool timeout the error message is > stream pool wait time exceeded: resource pool timed out this is a 4.x only patch and can be removed when we upgrade to a vitess version >= 6.x Signed-off-by: Mackenzie Starr <mstarr@etsy.com>
Repository owner
added
the
backport
label
Aug 17, 2020
Repository owner
self-assigned this
Aug 17, 2020
Gah, crap. I will close this and create another :-|. Have to see how best to create a PR against a non-master branch. |
Repository owner
closed this
Aug 17, 2020
Repository owner
deleted the
STORAGE-4262
branch
August 17, 2020 20:01
Repository owner
restored the
STORAGE-4262
branch
August 17, 2020 20:01
Repository owner
deleted the
STORAGE-4262
branch
August 18, 2020 19:22
jmchen28
pushed a commit
that referenced
this pull request
Jun 13, 2023
* decouple olap tx timeout from oltp tx timeout Since workload=olap bypasses the query timeouts (--queryserver-config-query-timeout) and also row limits, the natural assumption is that it also bypasses the transaction timeout. This is not the case, e.g. for a tablet where the --queryserver-config-transaction-timeout is 10. This commit: * Adds new CLI flag and YAML field to independently configure TX timeouts for OLAP workloads (--queryserver-config-olap-transaction-timeout). * Decouples TX kill interval from OLTP TX timeout via new CLI flag and YAML field (--queryserver-config-transaction-killer-interval). Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: pr comments #1 Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: pr comments #2 consolidate timeout logic in sc Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: remove unused tx killer flag Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: update 15_0_0_summary.md Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: fix race cond Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: pr comments #3 -txProps.timeout, +sc.expiryTime Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: pr comments #4 -atomic.Value for expiryTime Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: fix race cond (without atomic.Value) Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: pr comments #5 -unused funcs, fix comments, set ticks interval once Signed-off-by: Max Englander <max@planetscale.com> * decouple ol{a,t}p tx timeouts: pr comments #5 +txkill tests Signed-off-by: Max Englander <max@planetscale.com> * revert fmt changes Signed-off-by: Max Englander <max@planetscale.com> * implement pr review suggestion Signed-off-by: Max Englander <max@planetscale.com> Signed-off-by: Max Englander <max@planetscale.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In Vitess v4, a PlannedReparentShard operation could fail for a variety of reasons -- a common one for us being that long running statements (which are single statement transactions) causing the operation to timeout. For example (from here):
This left the shard in a broken state as both sides were in the RO state and the specific tablet that failed would not even serve RO traffic (returns error about being in state NOT_SERVING).
I was able to repeat this behavior using an Etsy Vitess Sandbox container (see details here).
I then backported some related fixes that were pushed after v4 (see vitessio/vitess#5376) -- specifically this code block to undo a failed DemoteMaster operation.
I could then no longer repeat the problem where a failed DemoteMaster left things in a broken state (see here for details).
This improved behavior will allow us to safely use vtctl for failovers and the command-line client we create and use to orchestrate the larger operations around it can add additional safety mechanisms such as examining the current replica lag and any long running statements/transactions on the master -- skipping the the failover attempt for that host unless an additional flag is passed (e.g. --force), this avoiding the window where there is no RW instance in the A/B host pair. But with this new vtctl behavior we are protected from broken states persisting when any edge case occurs (e.g. a new long running statement comes in between our check and the PlannedReparentShard).