-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: global runaway watch by system table and impl exector for query watch
#45465
*: global runaway watch by system table and impl exector for query watch
#45465
Conversation
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Hi @CabinfeverB. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
query watch
query watch
query watch
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
domain/resourcegroup/runaway.go
Outdated
rm.watchList.Set(key, record, ttl) | ||
rm.queryLock.Unlock() | ||
} else { | ||
if rm.watchList.Get(key) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be in lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is first check, because we generally believe that in most cases, we will not add a watch list to a key repeatedly
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Connor1996: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
domain/resourcegroup/runaway.go
Outdated
|
||
force := false | ||
// The manual record replaces the old record. | ||
force = record.Source == ManualSource || record.Source == rm.serverID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the priority between manually added records and watched records?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manually added records has higher priority. I will change it.
domain/resourcegroup/runaway.go
Outdated
rm.addWatchList(record, ttl, force) | ||
} | ||
|
||
// RemoveWatch is used to remove watch items from system table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misleading comment, this function only remove records from in-memory cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it does have ambiguity
domain/resourcegroup/runaway.go
Outdated
func (rm *RunawayManager) markRunaway(resourceGroupName, originalSQL, planDigest string, action string, matchType RunawayMatchType, now *time.Time) { | ||
source := rm.serverID | ||
if len(source) > 128 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why need truncate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the server IP in k8s will be domain name, so the length of it will be very long. And the length of feild is 128, do we need to change it to blob?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should set a field length that is long enough to store the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's set 512 same as analyze job
domain/runaway.go
Outdated
if r.ID > 0 { | ||
return r.ID, nil | ||
} | ||
case err := <-do.runawaySyncer.doneChan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if multiple add watches happen at the same time, how can you ensure you get the right message from this chan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't ensure. If keys of records are different and it gets wrong message , It can't get watch and will try wo get watch later. If keys are same, refer to https://github.com/pingcap/tidb/pull/45465/files#r1277100060
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you only send a message to notifyChan only once, one routine receives another routine's return message and drops it, then the other routine is blocked, this is a more severe situation.
domain/runaway.go
Outdated
} | ||
|
||
func (do *Domain) AddRunawayWatch(record *resourcegroup.QuarantineRecord) (int64, error) { | ||
if err := do.handleRunawayWatch(record); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't get what you want to explain, but it's still clear that the current logic cannot ensure the procedure to be atomic, so it can deliver wrong result
return infoschema.ErrResourceGroupNotExists.GenWithStackByArgs(record.ResourceGroupName) | ||
} | ||
if record.Action == rmpb.RunawayAction_NoneAction { | ||
if rg.RunawaySettings == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if when the RunawaySettings is not nil when adding the record and then changed to none, then what is the action of the record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This watch will be uselss but not be removed from watch list. You can see BeforeExecutor
, If no match action, it will do nothing.
start_time TIMESTAMP NOT NULL, | ||
end_time TIMESTAMP NOT NULL, | ||
watch varchar(12) NOT NULL, | ||
start_time datetime(6) NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what is the benefit for this change, I didn't see any other table to use datetime to represent time
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
if item == nil { | ||
return false, 0 | ||
} | ||
return true, item.Action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a manual add item without action, the action is none, in this case you should use the setting's action instead. BTW, if we support let manual record reflect the change of query limit's action change, I would expect watch record reflect this change too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see L461-463.
make sense, I will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
@@ -2773,10 +2777,16 @@ func upgradeToVer171(s Session, ver int64) { | |||
if ver >= version171 { | |||
return | |||
} | |||
mustExecute(s, "ALTER TABLE mysql.tidb_runaway_queries CHANGE COLUMN `tidb_server` `tidb_server` varchar(512)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to TestUpgradeVersionForResumeJob, I try to pass it by add two upgrade functions. Maybe it's not a good method, I just want to pass test first.
failpoint.Inject("FastRunawayGC", func() { | ||
expiredDuration = time.Second * 1 | ||
}) | ||
expiredTime := time.Now().Add(-expiredDuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the timezone offset also impact this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now gc clean up loop only affects tidb_runaway_queries
, it does not affect correctness.
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
Signed-off-by: Cabinfever_B <cabinfeveroier@gmail.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Connor1996, glorv, qw4990 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
@CabinfeverB: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What problem does this PR solve?
Issue Number: ref #43691
Problem Summary:
What is changed and how it works?
Check List
Tests
About runaway watch sync:
Execute query on TiDB1, and then execute query on TiDB2
We can see TiDB1 returns error 8253, and TiDB2 returns error 8254. And mysql.tidb_runaway_watch has the record.
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.