Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: add an variable to compatible with MySQL insert for OGG #7863

Merged
merged 10 commits into from
Oct 17, 2018

Conversation

jackysp
Copy link
Member

@jackysp jackysp commented Oct 10, 2018

What problem does this PR solve?

TiDB optimizes it insert flow with checking unique keys when committing the transaction which will make it not compatible with MySQL. For OGG it will cause some unexpected errors.
There is an option called "handlecollisions" in OGG. If it is ON, OGG will try to insert one record in the transaction when it meets some duplicated key error, OGG will handle the collision error and convert the insert statement to an update statement, then commit the transaction. It is the same behavior of insert on duplicate key update in MySQL or upset in PG, but OGG seems don't want to use these non-standard SQLs. It expects to meet the insert error and handle it. But TiDB checks the unique constrait when committing the transaction for insert statement. See the following example:

create table t (i int key);
insert into t values (1);
begin;
insert into t values (1);            -- OGG expects to meet the error here!
update t set i = 1 where i = 1;
commit;                              -- TiDB returns the error here!

Btw: It really hurts the performance.

What is changed and how it works?

Add an variables to compatible with MySQL insert behavior.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

  • Possible performance regression
    If "ON" it will cause at least 10x slower than "OFF" when bulking insert.

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to be included in the release note

@shenli
Copy link
Member

shenli commented Oct 10, 2018

Could you please provide more detailed info?

For OGG it will cause some unexpected errors.

@jackysp
Copy link
Member Author

jackysp commented Oct 10, 2018

@shenli , we have just confirmed it is the root cause of OGG's error. I'll update the description and remove the WIP tag.

@jackysp jackysp changed the title [WIP] executor: add an variable to compatible with MySQL insert for OGG executor: add an variable to compatible with MySQL insert for OGG Oct 10, 2018
Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added type/enhancement The issue or PR belongs to an enhancement. type/compatibility status/LGT2 Indicates that a PR has LGTM 2. and removed type/enhancement The issue or PR belongs to an enhancement. labels Oct 10, 2018
@eurekaka
Copy link
Contributor

/run-all-tests

@morgo
Copy link
Contributor

morgo commented Oct 10, 2018

May I suggest a name like tidb_defer_constraint_check = bool (inverted value)?

It is not immediately clear to me what tidb_compatible_insert means. If I had to guess, I would say that it is a syntax compatibility feature. I also wouldn't think by the name it affects performance >10x.

@shenli
Copy link
Member

shenli commented Oct 11, 2018

When using this feature to synchronize data through OGG, will you set the global scope value for tidb_compatible_insert? Or session scope?

@jackysp
Copy link
Member Author

jackysp commented Oct 11, 2018

@shenli global scope.

@shenli
Copy link
Member

shenli commented Oct 11, 2018

That could hurt the performance of the whole cluster. It is better to set the server scope (or even user scope) variable.

@jackysp
Copy link
Member Author

jackysp commented Oct 11, 2018

  1. it affects the performance for common insert statement only.
  2. many users send the request through a proxy, the server scope is not necessary for this common scene.
  3. user scope is a good idea, maybe we need to support it first. I think we still need to keep the global scope for this variable, even if we support the user scope due to some security issues.

@@ -228,6 +234,7 @@ const (
DefTiDBGeneralLog = 0
DefTiDBRetryLimit = 10
DefTiDBDisableTxnAutoRetry = false
DefTiDBDeferConstraintCheck = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use false as default value of a variable.
Did you forget to set this value in sessionVars ? @jackysp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If false is the preferred default, the name would need to change. Perhaps something like synchronous_constraint_check? I do prefer the word defer or delay though. delay is used in MySQL for non-safe use cases though (myisam's delay-key-write), which is why I suggested defer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name 'Deferrable constraints' is also common in other DBs. For example: https://www.postgresql.org/docs/9.1/static/sql-set-constraints.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can change a variable name. @morgo @jackysp
It's golang idiom to use false as default value, because when you forget to initialize a variable, it will be set to false automatically in golang, so use false is less likely to get trapped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidb_disable_deferred_constraints ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidb_disable_deferred_constraints sounds good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lilin90 we need some help of the name issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about tidb_constraint_check_in_place? @jackysp @gregwebs @tiancaiamao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can let golang zero values dictate how variable names end up facing the user. Also, consider that we can decide to change the default for a variable in the future. You can use a pointer or an enumeration in go if that helps, but we probably need a better abstraction for our variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lilin90 's suggestion.

@@ -1275,6 +1275,7 @@ const loadCommonGlobalVarsSQL = "select HIGH_PRIORITY * from mysql.global_variab
variable.TiDBHashAggPartialConcurrency + quoteCommaQuote +
variable.TiDBHashAggFinalConcurrency + quoteCommaQuote +
variable.TiDBBackoffLockFast + quoteCommaQuote +
variable.TiDBDeferConstraintCheck + quoteCommaQuote +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test the behavior, if this global variable do not exist in the underlying TiKV (sadly it holds), sessionVar.DeferConstraintCheck will not be initialized to 'true'.

mysql> select HIGH_PRIORITY * from mysql.global_variables where variable_name in ('a_new_add_global', 'tidb_index_join_batch_size', 'tidb_index_lookup_size');
+----------------------------+----------------+
| VARIABLE_NAME              | VARIABLE_VALUE |
+----------------------------+----------------+
| tidb_index_lookup_size     | 20000          |
| tidb_index_join_batch_size | 25000          |
+----------------------------+----------------+
2 rows in set (0.01 sec)

I insist that 'false' is a much better default value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have other defaults that are defaulted to non-zero values (integers). Are those also a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an issue to trace it, @tiancaiamao .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. @gregwebs
I'm always confirm to the golang idiom, and hope that could keep me from getting trapped.

Well, maybe we need more tests to cover the potential bugs? @jackysp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add a test case in the Schrödinger's upgrade test.

@jackysp
Copy link
Member Author

jackysp commented Oct 16, 2018

PTAL @tiancaiamao

@tiancaiamao
Copy link
Contributor

LGTM @winkyao @eurekaka

@tiancaiamao
Copy link
Contributor

/run-all-tests

@zz-jason zz-jason merged commit f3148da into pingcap:master Oct 17, 2018
@jackysp jackysp deleted the compatible_insert branch October 23, 2018 04:52
iamzhoug37 pushed a commit to iamzhoug37/tidb that referenced this pull request Oct 25, 2018
bugfix fixed pingcap#7518

expression: MySQL compatible current_user function (pingcap#7801)

plan: propagate constant over outer join (pingcap#7794)

- extract `outerCol = const` from join conditions and filter conditions,
  substitute `outerCol` in join conditions with `const`;
- extract `outerCol = innerCol` from join conditions, derive new join
  conditions based on this column equal condition and `outerCol` related
  expressions in join conditions and filter conditions;

util/timeutil: fix data race caused by forgetting set stats lease to 0 (pingcap#7901)

stats: handle ddl event for partition table (pingcap#7903)

plan: implement Operand and Pattern of cascades planner. (pingcap#7910)

planner: not convert to TableDual if empty range is derived from deferred constants (pingcap#7808)

plan: move projEliminate behind aggEliminate (pingcap#7909)

admin: fix admin check table bug of byte compare (pingcap#7887)

* admin: remove reflect deepEqual

stats: fix panic caused by empty histogram (pingcap#7912)

plan: fix panic caused by empty schema of LogicalTableDual (pingcap#7906)

* fix drop view if exist error (pingcap#7833)

executor: refine `explain analyze` (pingcap#7888)

executor: add an variable to compatible with MySQL insert for OGG (pingcap#7863)

expression: maintain `DeferredExpr` in aggressive constant folding. (pingcap#7915)

stats: fix histogram boundaries overflow error (pingcap#7883)

ddl:support the definition of `null` change to `not null` using `alter table` (pingcap#7771)

* ddl:support the definition of null change to not null using alter table

ddl: add check when create table with foreign key. (pingcap#7885)

* ddl: add check when create table with foreign key

planner: eliminate if null on non null column (pingcap#7924)

executor: fix a bug in point get (pingcap#7934)

planner, executor: refine ColumnPrune for LogicalUnionAll (pingcap#7930)

executor: fix panic when limit is too large (pingcap#7936)

ddl: add TiDB version to metrics (pingcap#7902)

stats: limit the length of sample values (pingcap#7931)

vendor: update tipb (pingcap#7893)

planner: support the Group and GroupExpr for the cascades planner (pingcap#7917)

store/tikv: log more information when other err occurs (pingcap#7948)

types: fix date time parse (pingcap#7933)

ddl: just print error message when ddl job is normal to calcel, to eliminate noisy log (pingcap#7875)

stats: update delta info for partition table (pingcap#7947)

explaintest: add explain test for partition pruning (pingcap#7505)

util: move disjoint set to util package (pingcap#7950)

util: add PreAlloc4Row and Insert for Chunk and List (pingcap#7916)

executor: add the slow log for commit (pingcap#7951)

expression: add builtin json_keys (pingcap#7776)

privilege: add USAGE in `show grants` for mysql compatibility (pingcap#7955)

ddl: fix invailid ddl job panic (pingcap#7940)

*: move ast.NewValueExpr to standalone parser_driver package (pingcap#7952)

Make the ast package get rid of the dependency of types.Datum

server: allow cors http request (pingcap#7939)

*: move `Statement` and `RecordSet` from ast to sqlexec package (pingcap#7970)

pr suggestion update

executor/aggfuncs: split unit tests to corresponding file (pingcap#7993)

store/tikv: fix typo (pingcap#7990)

executor, planner: clone proj schema for different children in buildProj4Union (pingcap#7999)

executor: let information_schema be the first database in ShowDatabases (pingcap#7938)

stats: use local feedback for partition table (pingcap#7963)

executor: add unit test for aggfuncs (pingcap#7966)

server: add log for binary execute statement (pingcap#7987)

admin: refine admin check decoder (pingcap#7862)

executor: improve wide table insert & update performance (pingcap#7935)

ddl: fix reassigned partition id in `truncate table` does not take effect (pingcap#7919)

fix reassigned partition id in truncate table does not take effect

add changelog for 2.1.0 rc4 (pingcap#8020)

*: make parser package dependency as small as possible (pingcap#7989)

parser: support `:=` in the `set` syntax (pingcap#8018)

According to MySQL document, `set` use the = assignment operator,
but the := assignment operator is also permitted

stats: garbage collect stats for partition table (pingcap#7962)

docs: add the proposal for the column pool (pingcap#7988)

expression: refine built-in func truncate to support uint arg (pingcap#8000)

stats: support show stats for partition table (pingcap#8023)

stats: update error rate for partition table (pingcap#8022)

stats: fix estimation for out of range point queries (pingcap#8015)

*: move parser to a separate repository (pingcap#8036)

executor: fix wrong result when index join on union scan. (pingcap#8031)

Do not modify Plan of dataReaderBuilder directly, because it would
impact next batch of outer rows, as well as other concurrent inner
workers. Instead, build a local child builder to store the child plan.

planner: fix a panic of a cached prepared statement with IndexScan (pingcap#8017)

*: fix the issue of executing DDL after executing SQL failure in txn (pingcap#8044)

* ddl, executor: fix the issue of executing DDL after executing SQL failure in txn

add unit test

remove debug info

add like evaluator case sensitive test

ddl, domain: make schema correct after canceling jobs (pingcap#7997)

unit test fix

code format

proposal: maintaining histograms in plan. (pingcap#7605)

support _tidb_rowid for table scan range (pingcap#8047)

var rename fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/LGT2 Indicates that a PR has LGTM 2. type/compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants