Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

Support converting unsupported collation to TiDB supported collation or skip the schema creation #1354

Open
namco1992 opened this issue Dec 22, 2020 · 13 comments
Labels
type/feature-request This issue is a feature request

Comments

@namco1992
Copy link

namco1992 commented Dec 22, 2020

Feature Request

Is your feature request related to a problem? Please describe:

TiDB doesn't support _unicode_ci, however, this is a common collation that is widely used. Currently, DM can't handle the schemas with _unicode_ci as the schema creation will fail. The schema creation can't be bypassed, too.

Describe the feature you'd like:

  1. DM should convert other unsupported collations to the supported collations (_bin and _general_ci if new_collations_enabled_on_first_bootstrap is enabled);
  2. If the auto conversion approach is not ideal, then at least allow the user to create schemas manually and then config the DM to skip the schema restore.

Teachability, Documentation, Adoption, Migration Strategy:

The auto-conversion or skip schema creation should be configured in the task configuration.

P.S. I would like to contribute this feature if it's accepted, Thanks!

@lance6716
Copy link
Collaborator

Sounds great. currently DM will stop at that error so user can use handle-error to replace that SQL, but an auto-conversion feature is definitely better.

Could you give an exmaple about how this configuration looks like in task configuration?

@namco1992
Copy link
Author

Sounds great. currently DM will stop at that error so user can use handle-error to replace that SQL, but an auto-conversion feature is definitely better.

Hello @lance6716, from what I saw in the code and what I observed from my tests, handle-error can't be used during the dump and load, which means the error can't be handled when task-mode: all:

dm/dm/worker/subtask.go

Lines 679 to 683 in 1e9a094

func (st *SubTask) HandleError(ctx context.Context, req *pb.HandleWorkerErrorRequest) error {
syncUnit, ok := st.currUnit.(*syncer.Syncer)
if !ok {
return terror.ErrWorkerOperSyncUnitOnly.Generate(st.currUnit.Type())
}

Could you give an exmaple about how this configuration looks like in task configuration?

I think the configuration could be under Global setting section:

# ----------- Global setting -----------
## ********* Basic configuration *********
name: test                       # The name of the task. Should be globally unique.
task-mode: all                   # The task mode. Can be set to `full`/`incremental`/`all`.

# Other configurations here...

force-utf8-collation: general_ci # The collation to force convert to, only applicable to `utf8`/`utf8mb4` character sets. Available collations are `bin`/`general_ci`(only if `new_collations_enabled_on_first_bootstrap` is enabled)
skip-schema-restoration: true    # Weather to skip the schema restoration. It only takes effect in `all` task mode

I feel it's good to have the option to allow users to skip the schema restoration, which enables users to have full control of the schema creation if needed.
WDYT?

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Dec 23, 2020

If the auto conversion approach is not ideal, then at least allow the user to create schemas manually and then config the DM to skip the schema restore.

You can manually create schemas in TiDB now, DM will skip the CREATE TABLE statement if table already exist in full mode.
Convert other unsupported collations also sounds great.

@namco1992
Copy link
Author

Hello @GMHDBJD, from what I saw it doesn't actually skip it, here are some related logs:

[2020/12/22 03:16:01.021 +00:00] [ERROR] [baseconn.go:178] ["execute statement failed"] [task=dm-replica-task] [unit=load] [query="/*details omitted*/"] [argument="[]"] [error="Error 1050: Table 'xxx' already exists"]
[2020/12/22 03:16:01.021 +00:00] [ERROR] [db.go:176] ["execute statements failed after retry"] [task=dm-replica-task] [unit=load] [queries=""] [arguments="[]"] [error="[code=10006:class=database:scope=not-set:level=high] execute statement failed: Error 1050: Table 'xxx' already exists"]
[2020/12/22 03:16:01.021 +00:00] [INFO] [loader.go:986] ["table already exists, skip it"] [task=dm-replica-task] [unit=load] ["table schema file"=./dumped_data.dm-replica-task/xxx-schema.sql]

It still executes the statements even though the table exists. The error is skippable if the collation used in the table is supported. However, if the collation is not supported, the execution will fail with another error Error 1273: Unsupported collation when new collation is enabled: 'utf8mb4_unicode_ci', which is not skippable since it's a different error.

It seems the CREATE TABLE IF NOT EXISTS doesn't check the table existence first, therefore the collation unsupported error will be thrown instead of the table exists error.

If DM can check the table existence at first and skip the creation when it exists, then I believe the skip-schema-restoration is not needed.

@namco1992
Copy link
Author

Hello @GMHDBJD @lance6716, any other concern regarding this issue? Do we have a consensus here about what should be implemented?

@lance6716
Copy link
Collaborator

It still executes the statements even though the table exists. The error is skippable if the collation used in the table is supported. However, if the collation is not supported, the execution will fail with another error Error 1273: Unsupported collation when new collation is enabled: 'utf8mb4_unicode_ci', which is not skippable since it's a different error.

sorry for the inconvenience. This is may because we didn't rewrite CREATE TABLE statement to CREATE TABLE IF NOT EXISTS (we use a filter to skip "table already exists" errors but didn't work well). Could you help check that such a rewriting could provide a workaround in dump & load phase? and welcome to help us implement it or let TiDB report "table already exists" not "Unsupported collation"

we're planning some character set related feature in this sprint, still not decide yet. We'll request your kindly help to implement this collation feature some days later 😄

@namco1992
Copy link
Author

namco1992 commented Dec 28, 2020

@lance6716 Actually the CREATE TABLE IF NOT EXISTS won't solve the problem. It seems MySQL/TiDB will vet the schema statements first and stop when it finds out the collation is not supported, therefore Error 1273: Unsupported collation will be thrown even when using CREATE TABLE IF NOT EXISTS.

I've bypassed this issue with lots of hacks anyway 😂 . Lemme know when you decide how to tackle this and I'm ready to help 😃

@GMHDBJD
Copy link
Collaborator

GMHDBJD commented Dec 30, 2020

Hi, @namco1992 . The following advice may help you.

  1. TiDB will support collation utf8mb4_unicode_ci and utf8_unicode_ci in 5.0-rc. #17596
  2. User can modify the file dbname.tbname-schema.sql in worker's dump dir, change the collation for create table statement and then resume the task.
  3. Dumpling support no-schema flag, which will dump table data except table schema. We can support it in loader too, then we only need to create table manually in TiDB.
  4. Handle Unsupported collation error specially. If table already exist, ignore the error.
  5. Auto convert the collation.

I prefer the 4th solution, because it is simple, effective and less configuration. What's your opinion? Any contribution will be appreciated.

@namco1992
Copy link
Author

Hello @GMHDBJD, I think 4th solution is good for me, to summarize:

  1. Execute the CREATE TABLE statement;
  2. If error && error == table already exists then ignore the error;
  3. If error && error == unspported collation then check the table existence. If table exists, ignore the error;

If we all agree on this flow then I can start to work on it. Thanks!

@lance6716
Copy link
Collaborator

@namco1992 LGTM, and for incremental phase, function handleSpecialDDLError could be used to detect unsupported collation error and rewrite SQL

@lance6716 lance6716 added the type/feature-request This issue is a feature request label Apr 7, 2021
@lance6716
Copy link
Collaborator

Hi, do you need some help to implement this @namco1992

@namco1992
Copy link
Author

namco1992 commented Apr 7, 2021

Hello @lance6716, sorry for dragging too long about this issue. I was busy with work stuff. However, I noticed that TiDB supported unicode_ci starts 4.0.11, so it seems not that urgent to have this feature anymore.

But I will still work on it as there are still many other collations that can cause this error. I will try to create a PR by this week if it's not too late, thanks!

@lance6716
Copy link
Collaborator

Take your time, and if you need some help you could comment here or join sig-migrate in slack

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/feature-request This issue is a feature request
Projects
None yet
Development

No branches or pull requests

3 participants