-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools: update descriptions of handling sharding ddl locks manually #876
tools: update descriptions of handling sharding ddl locks manually #876
Conversation
- `remove-id`: flag; string; `--remove-id`; optional; if being specified, it should be the ID of some DDL lock; if not being specified, remove the corresponding DDL lock information only when the breaking operation succeeds; if being specified, compulsorily remove the DDL lock information | ||
- `exec`: flag; boolean; `--exec`; optional; cannot be specified simultaneously with the `--skip` parameter; if being specified, ask the DM-worker to execute the corresponding DDL of the lock | ||
- `skip`: flag; boolean; `--skip`; optional; cannot be specified simultaneously with the `--exec` parameter; if being specified, ask the DM-worker to skip the corresponding DDL of the lock | ||
- `task-name`: non-flag; string; not optional; specify the name of the task containing the lock that is going to execute the breaking operation (you can check if a task contains the lock via [query-status](../task-handling/query-status.md)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the link seems is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll doublecheck it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the links should work well after shard-merge.md
is merged. There are two broken links for now.
» show-ddl-locks test | ||
{ | ||
"result": true, # show if the locking process succeeds | ||
"msg": "", # show the reason for the locking process failure or other descriptive information (for example, the locking task does not exist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason 🤔 is additional message
better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fair, I'll change it.
"locks": [ # the lock information list on the DM-master | ||
{ | ||
"ID": "test-`shard_db`.`shard_table`", # the ID of the lock, which is composed of the current task name and the corresponding schema/table information of the DDL | ||
"task": "test", # the task name of the lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock belongs to task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, then it should be:
- the lock ID, which is made up of the current task name and the schema/table information corresponding to the DDL
- the name of the task to which the lock belongs
Sounds right to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right.
{ | ||
"ID": "test-`shard_db`.`shard_table`", # the ID of the lock, which is composed of the current task name and the corresponding schema/table information of the DDL | ||
"task": "test", # the task name of the lock | ||
"owner": "127.0.0.1:8262", # the owner of the lock (the first DM-worker which receives the DDL event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove (the first DM-worker which receives the DDL event)
, it is not completely accurate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
|
||
##### Variables description | ||
|
||
- `worker`: flag; string; `--worker`; optional; can be specified multiple times; if not being specified, send requests for all DM-workers that are waiting for the lock to skip the DDL; if being specified, send requests for the specified DM-worker to skip the DDL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let all dm-worker to skip the ddl, except for owner?
```bash | ||
» unlock-ddl-lock test-`shard_db`.`shard_table` | ||
{ | ||
"result": true, # show if the unlocking process succeeds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it right? @csuzhangxc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inaccuracy. true
means successful, but false
can also be successful. some descriptions provided in the Some DM-workers go offline scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can I just say "the result of the unlocking process"? Should we specify what "true" and "false" means here? @GregoryIan @csuzhangxc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's unnecessary. @yikeke
|
||
Before `DM-master` tries to automatically unlock the sharding DDL lock, all the DM-workers need to receive the sharding DDL event (for details, see [shard merge principles](./shard-merge.md#Principles)). If the sharding DDL event is already in the synchronization process, and some DM-workers have gone offline and are not to be restarted (these DM-workers have been removed according to the application demand), then the sharding DDL lock cannot be automatically synchronized and unlocked because not all the DM-workers can receive the DDL event. | ||
|
||
> If you do not need to make some DM-workers offline in the process of synchronizing sharding DDL events, a better solution is to use `stop-task` to stop the running tasks first, make the DM-workers go offline, remove the corresponding configuration information from the task configuration file, and finally use `start-task` and the new task configuration to restart the synchronization task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not need to
=> need to
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be: "If you need to make some DM-workers offline when not in the process of synchronizing sharding DDL events".
Since the file in this PR has links that point to the file added in #887, this PR can only be merged after #887 is merged. @GregoryIan @csuzhangxc |
"workers": [ # the result list of the DDL execution/skipping operation of each DM-worker | ||
{ | ||
"result": true, # the result of the DDL execution/skipping operation | ||
"worker": "127.0.0.1:8262", # the address of the DM-worker (the DM-worker ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think # the DM-worker ID
is enough, below is same
#### Variables description | ||
|
||
- `worker`: flag; string; `--worker`; required; specify the DM-worker which needs to execute the breaking operation | ||
- `remove-id`: flag; string; `--remove-id`; optional; if being specified, it should be the ID of some DDL lock; if not being specified, remove the corresponding DDL lock information only when the breaking operation succeeds; if being specified, compulsorily remove the DDL lock information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove-id
is deprecated, we should indicate is deprecated
6. Use `unlock-dll-lock` to ask `DM-master` to actively unlock the DDL lock. | ||
- If the owner of the DDL lock has gone offline, you can use the parameter `--owner` to specify another DM-worker as the new owner to execute the DDL. | ||
- If any DM-worker reports an error, `result` will be set to `false`, and at this point you should check carefully if the errors of each DM-worker is acceptable and within expectations. | ||
- DM-workers that have gone offline will return the error `rpc error: code = Unavailable`, which is within expectations and can be neglected; but if other online DM-workers return errors, then you should deal with them based on the scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s example of If any DM-worker reports an error, `result` will be set to `false`, and at this point you should check carefully if the errors of each DM-worker is acceptable and within expectations.
should we put it under the statement at L208?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GregoryIan Yes, updated.
|
||
#### The reason for the abnormal lock | ||
|
||
It has the similar reason for the abnormal lock in [Some DM-workers restart during the DDL unlocking process](#scenario-2-some-dm-workers-restart-during-the-ddl-unlocking-process). If the DM-worker is temporarily unreachable when you ask the DM-worker to skip the DDL, this DM-worker might fail to skip the DDL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with Scenario 3
, we can also use solution of Scenario 3
to solve problem of Scenario 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct it, the difference is dm-master doesn't have a lock in Scenario 3, but dm-master has a new lock lock in Scenario 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
Via: pingcap/tidb-tools#161
Comment addressed. PTAL. @lilin90 @csuzhangxc @GregoryIan
Please help fix the broken link in this file, thanks. @lilin90