Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: update descriptions of handling sharding ddl locks manually #876

Merged
merged 14 commits into from
Feb 11, 2019

Conversation

yikeke
Copy link
Contributor

@yikeke yikeke commented Jan 28, 2019

Via: pingcap/tidb-tools#161
Comment addressed. PTAL. @lilin90 @csuzhangxc @GregoryIan

Please help fix the broken link in this file, thanks. @lilin90

@IANTHEREAL
Copy link
Contributor

we had add a dm directory #873, can you put this file in it @yikeke

@yikeke
Copy link
Contributor Author

yikeke commented Jan 30, 2019

we had add a dm directory #873, can you put this file in it @yikeke

Sure.

tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
- `remove-id`: flag; string; `--remove-id`; optional; if being specified, it should be the ID of some DDL lock; if not being specified, remove the corresponding DDL lock information only when the breaking operation succeeds; if being specified, compulsorily remove the DDL lock information
- `exec`: flag; boolean; `--exec`; optional; cannot be specified simultaneously with the `--skip` parameter; if being specified, ask the DM-worker to execute the corresponding DDL of the lock
- `skip`: flag; boolean; `--skip`; optional; cannot be specified simultaneously with the `--exec` parameter; if being specified, ask the DM-worker to skip the corresponding DDL of the lock
- `task-name`: non-flag; string; not optional; specify the name of the task containing the lock that is going to execute the breaking operation (you can check if a task contains the lock via [query-status](../task-handling/query-status.md))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the link seems is broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll doublecheck it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the links should work well after shard-merge.md is merged. There are two broken links for now.

tools/manually-handling-sharding-ddl-locks.md Outdated Show resolved Hide resolved
» show-ddl-locks test
{
"result": true, # show if the locking process succeeds
"msg": "", # show the reason for the locking process failure or other descriptive information (for example, the locking task does not exist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason 🤔 is additional message better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair, I'll change it.

"locks": [ # the lock information list on the DM-master
{
"ID": "test-`shard_db`.`shard_table`", # the ID of the lock, which is composed of the current task name and the corresponding schema/table information of the DDL
"task": "test", # the task name of the lock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock belongs to task

Copy link
Contributor Author

@yikeke yikeke Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, then it should be:

  • the lock ID, which is made up of the current task name and the schema/table information corresponding to the DDL
  • the name of the task to which the lock belongs

Sounds right to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.

{
"ID": "test-`shard_db`.`shard_table`", # the ID of the lock, which is composed of the current task name and the corresponding schema/table information of the DDL
"task": "test", # the task name of the lock
"owner": "127.0.0.1:8262", # the owner of the lock (the first DM-worker which receives the DDL event)
Copy link
Contributor

@IANTHEREAL IANTHEREAL Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove (the first DM-worker which receives the DDL event), it is not completely accurate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.


##### Variables description

- `worker`: flag; string; `--worker`; optional; can be specified multiple times; if not being specified, send requests for all DM-workers that are waiting for the lock to skip the DDL; if being specified, send requests for the specified DM-worker to skip the DDL
Copy link
Contributor

@IANTHEREAL IANTHEREAL Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let all dm-worker to skip the ddl, except for owner?

```bash
» unlock-ddl-lock test-`shard_db`.`shard_table`
{
"result": true, # show if the unlocking process succeeds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it right? @csuzhangxc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inaccuracy. true means successful, but false can also be successful. some descriptions provided in the Some DM-workers go offline scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can I just say "the result of the unlocking process"? Should we specify what "true" and "false" means here? @GregoryIan @csuzhangxc

Copy link
Member

@lilin90 lilin90 Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unnecessary. @yikeke


Before `DM-master` tries to automatically unlock the sharding DDL lock, all the DM-workers need to receive the sharding DDL event (for details, see [shard merge principles](./shard-merge.md#Principles)). If the sharding DDL event is already in the synchronization process, and some DM-workers have gone offline and are not to be restarted (these DM-workers have been removed according to the application demand), then the sharding DDL lock cannot be automatically synchronized and unlocked because not all the DM-workers can receive the DDL event.

> If you do not need to make some DM-workers offline in the process of synchronizing sharding DDL events, a better solution is to use `stop-task` to stop the running tasks first, make the DM-workers go offline, remove the corresponding configuration information from the task configuration file, and finally use `start-task` and the new task configuration to restart the synchronization task.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not need to => need to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be: "If you need to make some DM-workers offline when not in the process of synchronizing sharding DDL events".

@lilin90
Copy link
Member

lilin90 commented Feb 3, 2019

Since the file in this PR has links that point to the file added in #887, this PR can only be merged after #887 is merged. @GregoryIan @csuzhangxc

@IANTHEREAL IANTHEREAL mentioned this pull request Feb 3, 2019
8 tasks
"workers": [ # the result list of the DDL execution/skipping operation of each DM-worker
{
"result": true, # the result of the DDL execution/skipping operation
"worker": "127.0.0.1:8262", # the address of the DM-worker (the DM-worker ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think # the DM-worker ID is enough, below is same

#### Variables description

- `worker`: flag; string; `--worker`; required; specify the DM-worker which needs to execute the breaking operation
- `remove-id`: flag; string; `--remove-id`; optional; if being specified, it should be the ID of some DDL lock; if not being specified, remove the corresponding DDL lock information only when the breaking operation succeeds; if being specified, compulsorily remove the DDL lock information
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove-id is deprecated, we should indicate is deprecated

6. Use `unlock-dll-lock` to ask `DM-master` to actively unlock the DDL lock.
- If the owner of the DDL lock has gone offline, you can use the parameter `--owner` to specify another DM-worker as the new owner to execute the DDL.
- If any DM-worker reports an error, `result` will be set to `false`, and at this point you should check carefully if the errors of each DM-worker is acceptable and within expectations.
- DM-workers that have gone offline will return the error `rpc error: code = Unavailable`, which is within expectations and can be neglected; but if other online DM-workers return errors, then you should deal with them based on the scenario.
Copy link
Contributor

@IANTHEREAL IANTHEREAL Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s example of If any DM-worker reports an error, `result` will be set to `false`, and at this point you should check carefully if the errors of each DM-worker is acceptable and within expectations. should we put it under the statement at L208?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GregoryIan Yes, updated.


#### The reason for the abnormal lock

It has the similar reason for the abnormal lock in [Some DM-workers restart during the DDL unlocking process](#scenario-2-some-dm-workers-restart-during-the-ddl-unlocking-process). If the DM-worker is temporarily unreachable when you ask the DM-worker to skip the DDL, this DM-worker might fail to skip the DDL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with Scenario 3, we can also use solution of Scenario 3 to solve problem of Scenario 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct it, the difference is dm-master doesn't have a lock in Scenario 3, but dm-master has a new lock lock in Scenario 2

Copy link
Contributor

@IANTHEREAL IANTHEREAL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@csuzhangxc
Copy link
Member

LGTM

@lilin90 lilin90 merged commit 8f1bdb5 into pingcap:master Feb 11, 2019
@yikeke yikeke deleted the manually-handle-sharding-DDL-lock branch June 27, 2019 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants