-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support incremental commit of INSERT...SELECT #18038
Comments
"commit the running transaction periodically" would break the atomic property of the original transaction. Is that acceptable by the user? Seems it's the same with #18028. Although it's issue summary is about "performance", but the original problem that #18028 wants to solve is similar to this issue. Is #18028 (comment) a workaround for users? |
For now:
|
As an alternative, may I suggest adding the Edit: See #18748 for |
We need to investigate the user scenario in depth, and then make a decision after the research is completed. |
Note that The setting to make it incremental commit is called |
I hit this today while trying to create examples for partitioning on moderate-sized tables. Because TiDB does not support adding partitioning, or removing partitioning from existing tables, it requires an export and then a reload: mysql> ALTER TABLE t1 PARTITION BY RANGE ( YEAR(`start_date`) ) (
-> PARTITION `p2010` VALUES LESS THAN (2011),
-> PARTITION `p2011` VALUES LESS THAN (2012),
-> PARTITION `p2012` VALUES LESS THAN (2013),
-> PARTITION `p2013` VALUES LESS THAN (2014),
-> PARTITION `p2014` VALUES LESS THAN (2015),
-> PARTITION `p2015` VALUES LESS THAN (2016),
-> PARTITION `p2016` VALUES LESS THAN (2017),
-> PARTITION `p2017` VALUES LESS THAN (2018),
-> PARTITION `p2018` VALUES LESS THAN (2019),
-> PARTITION `p2019` VALUES LESS THAN (2020),
-> PARTITION `pmax` VALUES LESS THAN (MAXVALUE)
-> );
ERROR 1105 (HY000): alter table partition is unsupported
mysql> ALTER TABLE trips_partitioned REMOVE PARTITIONING;
ERROR 8200 (HY000): Unsupported remove partitioning It is not user-friendly to show examples that require a local dumper as an in-between step. It is (obviously) optimal if TiDB supports all DDL variants, but anticipating that this could take some time, incremental commit is important because it provides a viable workaround. |
Here is another use case: This feature combines very well with a time-travel query ( #18672 ), so you can effectively Since there is already a PR for the |
|
@jackysp Thanks! I can confirm it works (kind of): DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
pk VARBINARY(36) NOT NULL PRIMARY KEY,
b BIGINT NOT NULL,
c BIGINT NOT NULL,
pad VARBINARY(2048),
INDEX (b),
INDEX (c)
);
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM dual;
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b JOIN t1 c;
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b JOIN t1 c;
-- This should insert 1000 rows (10^3)
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b JOIN t1 c;
-- 100K rows (fails by default)
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b LIMIT 100000;
SET tidb_dml_batch_size = 20000;
SET tidb_batch_insert = 1;
-- 100K rows (now works)
INSERT INTO t1 SELECT uuid(), FLOOR(RAND()*5), FLOOR(RAND()*1000000), HEX(RANDOM_BYTES(1000)) FROM t1 a JOIN t1 b LIMIT 100000; I don't think I also noted that if I try to simulate a very large table (inserting 1010^3 rows) I could trigger out of memory controls. |
(note that the hint should be placed after the INSERT keyword, not at the end of the statement.) INSERT /*+ SET_VAR(...) */ INTO tbl1 SELECT * from tbl2; |
I have a proposal for how to address this issue. I miss some of the context about why certain feature-flags exist, so I appreciate input:
Items 1 and 2 make |
Have been resolved by non-transactional-dml? |
Feature Request
Is your feature request related to a problem? Please describe:
TiDB does not handle large transactions well.
INSERT...SELECT is a common and generally inexpensive mechanism used by MySQL applications to insert a large amount of data from one table (or query result) into another table. This does not work well in TiDB if there are very many rows in the transaction.
Describe the feature you'd like:
A solution to support large INSERT...SELECT transactions in TiDB could be to automatically commit the running transaction periodically so that the transaction does not grow to be too large.
I'd like to see support for this in 2 forms:
A per-statement flag that could be added to the SQL itself, like this:
INSERT INTO tbl1 SELECT * from tbl2 /*+ TIDB_INCREMENTAL_COMMIT(10000) */
A system variable that can be set either as a SESSION or GLOBAL variable (with a corresponding configuration variable to set the GLOBAL value on startup):
SET GLOBAL tidb_incremental_commit=10000;
Describe alternatives you've considered:
Teachability, Documentation, Adoption, Migration Strategy:
Users migrating from MySQL might run into big problems in TiDB because they're using this idiom. Adding support for this could make it easier for those users to migrate their applications to TiDB.
If this feature is implemented, it should be considered whether it applies to all statements or only specific types of statements (LOAD DATA, INSERT...SELECT, etc.).
The text was updated successfully, but these errors were encountered: