14.6 InnoDB On-Disk Structures

14.6 InnoDB在磁盘上的存储结构

[TOC]

This section describes InnoDB on-disk structures and related topics.

本文详细介绍InnoDB在磁盘上的存储结构及相关模型。

14.6.1 Tables

14.6.1 表(Table)

This section covers topics related to InnoDB tables.

下面先介绍一些InnoDB表的基础知识。

14.6.1.1 Creating InnoDB Tables

14.6.1.1 创建 InnoDB 表

To create an InnoDB table, use the CREATE TABLE statement.

要创建 InnoDB 表, 请使用 CREATE TABLE 语句。

CREATE TABLE t1 (a INT, b CHAR (20), PRIMARY KEY (a)) ENGINE=InnoDB;

You do not need to specify the ENGINE=InnoDB clause if InnoDB is defined as the default storage engine, which it is by default. To check the default storage engine, issue the following statement:

如果 InnoDB 是默认存储引擎的话, 则无需指定 ENGINE=InnoDB 子句。想要查看默认存储引擎, 请执行以下语句:

mysql> SELECT @@default_storage_engine;
+--------------------------+
| @@default_storage_engine |
+--------------------------+
| InnoDB                   |
+--------------------------+

You might still use ENGINE=InnoDB clause if you plan to use mysqldump or replication to replay the CREATE TABLE statement on a server where the default storage engine is not InnoDB.

An InnoDB table and its indexes can be created in the system tablespace, in a file-per-table tablespace, or in a general tablespace. When innodb_file_per_table is enabled, which is the default, an InnoDB table is implicitly created in an individual file-per-table tablespace. Conversely, when innodb_file_per_table is disabled, an InnoDB table is implicitly created in the InnoDB system tablespace. To create a table in a general tablespace, use CREATE TABLE ... TABLESPACE syntax. For more information, see Section 14.6.3.3, General Tablespaces.

如果需要使用 mysqldump , 或者使用主从复制, 可能某些MySQL服务器的默认存储引擎并不是InnoDB, 那么在 CREATE TABLE 时就有必要指定 ENGINE=InnoDB 子句。

可以在下面这些表空间中创建InnoDB表以及对应的索引:

system tablespace, 系统表空间。
file-per-table tablespace, 每个表一个文件的表空间。如果启用了 innodb_file_per_table, 则会隐式地在这个表空间中创建 InnoDB 表(默认就是启用的)。如果禁用 innodb_file_per_table, 则会隐式地在 InnoDB 的系统表空间中创建。
general tablespace, 通用表空间。要在通用表空间中创建表, 请使用 CREATE TABLE ... TABLESPACE 语法。更多信息, 请参见 Section 14.6.3.3, General Tablespaces。

When you create an InnoDB table, MySQL creates a .frm file in the database directory under the MySQL data directory. For more information about .frm files, see InnoDB Tables and .frm Files. For a table created in a file-per-table tablespace, MySQL also creates an .ibd tablespace file in the database directory, by default.

A table created in the InnoDB system tablespace is created in an existing ibdata file, which resides in the MySQL data directory. A table created in a general tablespace is created in an existing general tablespace .ibd file. General tablespace files can be created inside or outside of the MySQL data directory. For more information, see Section 14.6.3.3, General Tablespaces.

Internally, InnoDB adds an entry for each table to the InnoDB data dictionary. The entry includes the database name. For example, if table t1 is created in the test database, the data dictionary entry for the database name is 'test/t1'. This means you can create a table of the same name (t1) in a different database, and the table names do not collide inside InnoDB.

创建InnoDB表时, MySQL会在data目录下面的数据库目录中创建一个 .frm 文件。有关 .frm 文件的更多信息, 请参见InnoDB Tables and .frm Files。

对于在 file-per-table 表空间中创建的表, MySQL默认还会在数据库目录下创建一个 .ibd tablespace 文件。

在 InnoDB 系统表空间中创建的表, 则是放到 MySQL data 目录下已有的 ibdata文件中。

在 general tablespace 中创建的表, 位于通用表空间的已有 .ibd 文件中。

通用表空间文件, 可以在MySQL数据目录内部或外部创建。更多信息, 请参见 Section 14.6.3.3, General Tablespaces。

在内部, InnoDB为每个表添加相应的条目到InnoDB数据字典中。条目中包括数据库名称。例如, 如果在 test 数据库中创建了表t1, 则数据字典条目中的名称为 'test/t1'。这意味着我们可以在其他数据库中创建相同名称的表(t1), 并且表名不会在InnoDB内部冲突。

InnoDB Tables and `.frm` Files

InnoDB Tables 与 `.frm` 文件的关系

MySQL stores data dictionary information for tables in .frm files in database directories. Unlike other MySQL storage engines, InnoDB also encodes information about the table in its own internal data dictionary inside the system tablespace. When MySQL drops a table or a database, it deletes one or more .frm files as well as the corresponding entries inside the InnoDB data dictionary. You cannot move InnoDB tables between databases simply by moving the .frm files. For information about moving InnoDB tables, see Section 14.6.1.4, Moving or Copying InnoDB Tables.

MySQL将表的数据字典信息存放在数据库目录下的 .frm 文件中。与其他存储引擎不一样, InnoDB 还将table信息进行编码并存放到自己的内部数据字典中(位于系统表空间内)。当 MySQL 删除一个表或者一个数据库时, 它会删除一个或多个 .frm 文件, 以及 InnoDB 数据字典中的相应条目。所以我们不能直接在多个数据库中通过移动 .frm 文件的方式来移动 InnoDB 表。有关移动 InnoDB 表的信息, 请参阅第 14.6.1.4 节, 移动或复制 InnoDB 表.

InnoDB Tables and Row Formats

InnoDB Tables 与 Row Formats

The default row format for InnoDB tables is defined by the innodb_default_row_format configuration option, which has a default value of DYNAMIC. Dynamic and Compressed row format allow you to take advantage of InnoDB features such as table compression and efficient off-page storage of long column values. To use these row formats, innodb_file_per_table must be enabled (the default as of MySQL 5.6.6) and innodb_file_format must be set to Barracuda.

InnoDB表默认的行格式由 innodb_default_row_format 配置项来定义, 默认值为 DYNAMIC。 Dynamic 和 Compressed 行格式允许我们使用 InnoDB 特性, 例如压缩表和长列值的高效页外存储。

要使用这些行格式, 必须启用 innodb_file_per_table 配置(从MySQL 5.6.6 开始就是默认值), 并将 innodb_file_format 设置为 Barracuda。

SET GLOBAL innodb_file_per_table=1;
SET GLOBAL innodb_file_format=barracuda;
CREATE TABLE t3 (a INT, b CHAR (20), PRIMARY KEY (a)) ROW_FORMAT=DYNAMIC;
CREATE TABLE t4 (a INT, b CHAR (20), PRIMARY KEY (a)) ROW_FORMAT=COMPRESSED;

Alternatively, you can use CREATE TABLE ... TABLESPACE syntax to create an InnoDB table in a general tablespace. General tablespaces support all row formats. For more information, see Section 14.6.3.3, General Tablespaces.

也可以使用 CREATE TABLE ... TABLESPACE 语法, 将 InnoDB 表创建到通用表空间。通用表空间支持所有行格式。更多信息请参阅第 14.6.3.3 节, 通用表空间。

CREATE TABLE t1 (id INT PRIMARY KEY) TABLESPACE ts1 ROW_FORMAT=DYNAMIC;

CREATE TABLE ... TABLESPACE syntax can also be used to create InnoDB tables with a Dynamic row format in the system tablespace, alongside tables with a Compact or Redundant row format.

CREATE TABLE ... TABLESPACE 语法也可以在系统表空间中创建行格式=具有 Dynamic 的 InnoDB表, 以及具有 Compact 或 Redundant 行格式的表。

CREATE TABLE t1 (id INT PRIMARY KEY) TABLESPACE = innodb_system ROW_FORMAT=DYNAMIC;

For more information about InnoDB row formats, see Section 14.11, InnoDB Row Formats. For how to determine the row format of an InnoDB table and the physical characteristics of InnoDB row formats, see Section 14.11, InnoDB Row Formats.

有关 InnoDB 行格式的更多信息, 请参阅第 14.11 节, InnoDB 行格式。关于如何确定 InnoDB 表的行格式及其物理特性, 请参见第14.11节, InnoDB行格式.

InnoDB Tables and Primary Keys

InnoDB 以及主键

Always define a primary key for an InnoDB table, specifying the column or columns that:

Are referenced by the most important queries.
Are never left blank.
Never have duplicate values.
Rarely if ever change value once inserted.

For example, in a table containing information about people, you would not create a primary key on (firstname, lastname) because more than one person can have the same name, some people have blank last names, and sometimes people change their names. With so many constraints, often there is not an obvious set of columns to use as a primary key, so you create a new column with a numeric ID to serve as all or part of the primary key. You can declare an auto-increment column so that ascending values are filled in automatically as rows are inserted:

始终为 InnoDB 表定义一个主键, 主键可以指定为单列或者多列, 满足这些特征:

被最重要的查询所引用。
永远不会留空。
永远没有重复值。
一旦插入, 主键的值几乎不会改变。

例如, 在人员信息相关的表中, 我们不能使用 (firstname, lastname) 作为主键, 因为可以有多人具有相同的名字, 有些人的姓氏为空, 有时还会更改姓名. 由于有如此多的约束, 通常业务信息中没有什么值能作为主键, 因此我们可以创建一个带有数字 ID 的新列作为主键的全部或部分。可以声明一个 auto-increment 列, 以便在插入行时自动填充升序值:

-- ID的值, 就像是一个定位指针, 其他表可以和他建立某种关系.
CREATE TABLE t5 (id INT AUTO_INCREMENT, b CHAR (20), PRIMARY KEY (id));

-- 主键可以使用单列, 也可以组合使用多列. 但自增列必须位于主键的第一节.
CREATE TABLE t6 (id INT AUTO_INCREMENT, a INT, b CHAR (20), PRIMARY KEY (id,a));

Although the table works correctly without defining a primary key, the primary key is involved with many aspects of performance and is a crucial design aspect for any large or frequently used table. It is recommended that you always specify a primary key in the CREATE TABLE statement. If you create the table, load data, and then run ALTER TABLE to add a primary key later, that operation is much slower than defining the primary key when creating the table.

尽管该表在没有定义主键的情况下也能正常工作, 但主键涉及性能的许多方面, 并且对于任何大型或经常使用的表来说都是至关重要的设计方面。建议您始终在 CREATE TABLE 语句中指定主键。如果创建表, 加载数据, 然后运行 ALTER TABLE 添加主键, 该操作比创建表时定义主键要慢得多。

Viewing InnoDB Table Properties

查看 InnoDB 表的属性

To view the properties of an InnoDB table, issue a SHOW TABLE STATUS statement:

要查看 InnoDB 表的属性, 可以执行 SHOW TABLE STATUS 语句:

mysql> SHOW TABLE STATUS FROM test LIKE 't%' \G;
*************************** 1. row ***************************
           Name: t1
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
 Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2015-03-16 15:13:31
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

For information about SHOW TABLE STATUS output, see Section 13.7.5.36, SHOW TABLE STATUS Statement.

InnoDB table properties may also be queried using the InnoDB Information Schema system tables:

有关 SHOW TABLE STATUS 的输出信息, 请参阅 Section 13.7.5.36, SHOW TABLE STATUS 声明`。

也可以通过 INFORMATION_SCHEMA 中的 INNODB_SYS_TABLES 信息来查询 InnoDB 表属性:

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE NAME='test/t1' \G
*************************** 1. row ***************************
     TABLE_ID: 45
         NAME: test/t1
         FLAG: 1
       N_COLS: 5
        SPACE: 35
  FILE_FORMAT: Antelope
   ROW_FORMAT: Compact
ZIP_PAGE_SIZE: 0
   SPACE_TYPE: Single
1 row in set (0.00 sec)

For more information, see Section 14.16.3, InnoDB INFORMATION_SCHEMA System Tables.

14.6.1.2 Creating Tables Externally

There are different reasons for creating InnoDB tables externally; that is, creating tables outside of the data directory. Those reasons might include space management, I/O optimization, or placing tables on a storage device with particular performance or capacity characteristics, for example.

InnoDB supports the following methods for creating tables externally:

14.6.1.2 创建外部表

在外部创建 InnoDB 表有不同的原因; 也就是说, 在数据目录之外创建表。例如, 可能包括这些原因:

空间管理
I/O 优化
将表放置在具有特定性能或容量特征的存储设备上。

InnoDB 支持这些创建外部表的方法:

Using the DATA DIRECTORY Clause
Using CREATE TABLE ... TABLESPACE Syntax
Creating a Table in an External General Tablespace

Using the DATA DIRECTORY Clause

You can create an InnoDB table in an external directory by specifying a DATA DIRECTORY clause in the CREATE TABLE statement.

使用DATA DIRECTORY子句

可以在 CREATE TABLE 时通过 DATA DIRECTORY 子句, 指定外部目录。

CREATE TABLE t1 (id INT PRIMARY KEY) DATA DIRECTORY = '/external/directory';

The DATA DIRECTORY clause is supported for tables created in file-per-table tablespaces. Tables are implicitly created in file-per-table tablespaces when the innodb_file_per_table variable is enabled, which it is by default.

在 file-per-table 表空间中创建的表, 支持 DATA DIRECTORY 子句。当 innodb_file_per_table 变量默认启用时, Tables是在 file-per-table 表空间中隐式创建的。

mysql> SELECT @@innodb_file_per_table;
+-------------------------+
| @@innodb_file_per_table |
+-------------------------+
|                       1 |
+-------------------------+

For more information about file-per-table tablespaces, see Section 14.6.3.2, File-Per-Table Tablespaces.

Be sure of the directory location you choose, as the DATA DIRECTORY clause cannot be used with ALTER TABLE to change the location later.

有关 file-per-table 表空间的更多信息, 请参阅 Section 14.6.3.2, File-Per-Table Tablespaces。

请确认所选择的目录位置, 因为 DATA DIRECTORY 子句指定的位置, 不能通过 ALTER TABLE 语句来修改。

When you specify a DATA DIRECTORY clause in a CREATE TABLE statement, the table's data file (<table_name>.ibd) is created in a schema directory under the specified directory, and an .isl file (*table_name*.isl) that contains the data file path is created in the schema directory under the MySQL data directory. An .isl file is similar in function to a symbolic link. (Actual symbolic links are not supported for use with InnoDB data files.)

The following example demonstrates creating a table in an external directory using the DATA DIRECTORY clause. It is assumed that the innodb_file_per_table variable is enabled.

当我们在 CREATE TABLE 语句中指定 DATA DIRECTORY 子句时, 表的数据文件(<table_name>.ibd) 在指定路径下的 schema 目录中创建; 同时也会在 MySQL 数据目录下的 <schema> 目录中, 创建一个 .isl 文件(<table_name>.isl), 其中包含数据文件的真实路径。 .isl 文件在功能上类似于符号链接。 (因为实际的符号链接不支持和InnoDB 数据文件一起使用。)

以下示例演示如何使用 DATA DIRECTORY 子句在外部目录中创建表。假设启用了 innodb_file_per_table 变量。

mysql> USE test;
Database changed

mysql> CREATE TABLE t1 (id INT PRIMARY KEY) DATA DIRECTORY = '/external/directory';

# MySQL在外部路径下, 对应的schema目录中创建table数据文件

shell> cd /external/directory/test
shell> ls
t1.ibd

# 在 MySQL data 路径下, 创建一个 .isl 文件, 其中包含数据文件的位置

shell> cd /path/to/mysql/data/test
shell> ls
db.opt  t1.frm  t1.isl

Usage Notes:

MySQL initially holds the tablespace data file open, preventing you from dismounting the device, but might eventually close the file if the server is busy. Be careful not to accidentally dismount an external device while MySQL is running, or start MySQL while the device is disconnected. Attempting to access a table when the associated data file is missing causes a serious error that requires a server restart.

A server restart might fail if the data file is not found at the expected path. In this case, manually remove the .isl file from the schema directory. After restarting, drop the table to remove the .frm file and the information about the table from the data dictionary.

使用说明:

MySQL初始化时会将表空间数据文件保持打开状态, 以防止用户不小心卸载设备, 但如果服务器繁忙, 则可能会关闭该文件。注意不要在 MySQL 运行时意外地卸载外部设备, 或在设备断开连接的时候启动 MySQL。在相关数据文件丢失时尝试访问表, 会导致需要重启服务器的严重错误。

如果在预期路径中找不到数据文件, 则服务器重启可能会失败。在这种情况下, 请手工从schema目录中删除 .isl 文件。重启完成后, 通过drop表来删除 .frm 文件和 data dictionary 中的表信息。
Before placing a table on an NFS-mounted volume, review potential issues outlined in Using NFS with MySQL.
If using an LVM snapshot, file copy, or other file-based mechanism to back up the table's data file, always use the FLUSH TABLES ... FOR EXPORT statement first to ensure that all changes buffered in memory are flushed to disk before the backup occurs.
Using the DATA DIRECTORY clause to create a table in an external directory is an alternative to using symbolic links, which InnoDB does not support.
The DATA DIRECTORY clause is not supported in a replication environment where the source and replica reside on the same host. The DATA DIRECTORY clause requires a full directory path. Replicating the path in this case would cause the source and replica to create the table in same location.
在挂载的 NFS 卷上放置表之前, 请参考 Using NFS with MySQL.。
如果使用 LVM 快照、文件复制或其他基于文件的机制来备份表的数据文件, 请先执行 FLUSH TABLES ... FOR EXPORT 语句, 以确保所有缓存在内存中的更改都被 [flushed](https://dev.mysql.com/doc/ refman/5.7/en/glossary.html#glos_flush) 到磁盘, 必须在备份发生之前执行。
使用 DATA DIRECTORY 子句在外部目录中创建表是使用 symbolic links 的替代方法, 因为 InnoDB 不支持符号链接。
在主从服务器都部署在同一宿主机的环境中, 不支持使用 DATA DIRECTORY 子句。 DATA DIRECTORY 子句需要完整的目录路径。在这种情况下会导致主从机器都在同一位置创建表。

Using `CREATE TABLE ... TABLESPACE` Syntax

CREATE TABLE ... TABLESPACE syntax can be used in combination with the DATA DIRECTORY clause to create a table in an external directory. To do so, specify innodb_file_per_table as the tablespace name.

使用 `CREATE TABLE ... TABLESPACE` 语法

CREATE TABLE ... TABLESPACE 语法可以结合 DATA DIRECTORY 子句来创建外部目录中的表。为此, 请使用 innodb_file_per_table 作为表空间名称。

mysql> CREATE TABLE t2 (id INT PRIMARY KEY) TABLESPACE = innodb_file_per_table
       DATA DIRECTORY = '/external/directory';

This method is supported only for tables created in file-per-table tablespaces, but does not require the innodb_file_per_table variable to be enabled. In all other respects, this method is equivalent to the CREATE TABLE ... DATA DIRECTORY method described above. The same usage notes apply.

这种方法仅支持在 file-per-table 表空间中创建的表, 但不要求 [innodb_file_per_table](https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html #sysvar_innodb_file_per_table) 变量被启用。在所有其他方面, 此方法等效于上述 CREATE TABLE ... DATA DIRECTORY 方法。适用相同的使用说明。

Creating a Table in an External General Tablespace

You can create a table in a general tablespace that resides in an external directory.

For information about creating a general tablespace in an external directory, see Creating a General Tablespace.
For information about creating a table in a general tablespace, see Adding Tables to a General Tablespace.

在外部通用表空间中创建表

还可以在放置到外部目录中的通用表空间里创建表。

有关在外部目录中创建通用表空间的信息, 请参阅 Creating a General Tablespace。
有关在通用表空间中创建表的信息, 请参阅Adding Tables to a General Tablespace。

14.6.1.3 Importing InnoDB Tables

This section describes how to import tables using the Transportable Tablespaces feature, which permits importing tables, partitioned tables, or individual table partitions that reside in file-per-table tablespaces. There are many reasons why you might want to import tables:

To run reports on a non-production MySQL server instance to avoid placing extra load on a production server.
To copy data to a new replica server.
To restore a table from a backed-up tablespace file.
As a faster way of moving data than importing a dump file, which requires reinserting data and rebuilding indexes.
To move a data to a server with storage media that is better suited to your storage requirements. For example, you might move busy tables to an SSD device, or move large tables to a high-capacity HDD device.

The Transportable Tablespaces feature is described under the following topics in this section:

14.6.1.3 导入 InnoDB 表

本节介绍如何使用 Transportable Tablespaces 功能导入表, 该功能允许导入表、分区表或驻留在 file-per-table 表空间中的单个表分区。想要导入表的原因有很多:

在非生产的 MySQL 实例上执行报表分析以避免对生产服务器产生影响。
将数据复制到新的副本服务器。
从备份的表空间文件中恢复表。
作为一种比导入dump文件更快的移动数据方式, dump文件导入时还需要重新插入数据和重建索引。
将数据移动到更适合存储要求的存储介质。例如, 您可以将繁忙的表移至 SSD 设备, 或将大表移至高容量 HDD 设备。

Transportable Tablespaces 功能分为这些部分进行描述:

Prerequisites
Importing Tables
Importing Partitioned Tables
Importing Table Partitions
Limitations
Usage Notes
Internals

Prerequisites

The innodb_file_per_table variable must be enabled, which it is by default.
The page size of the tablespace must match the page size of the destination MySQL server instance. InnoDB page size is defined by the innodb_page_size variable, which is configured when initializing a MySQL server instance.
If the table is in a foreign key relationship, foreign_key_checks must be disabled before executing DISCARD TABLESPACE. Also, you should export all foreign key related tables at the same logical point in time, as ALTER TABLE ... IMPORT TABLESPACE does not enforce foreign key constraints on imported data. To do so, stop updating the related tables, commit all transactions, acquire shared locks on the tables, and perform the export operations.
When importing a table from another MySQL server instance, both MySQL server instances must have General Availability (GA) status and must be the same version. Otherwise, the table must be created on the same MySQL server instance into which it is being imported.
If the table was created in an external directory by specifying the DATA DIRECTORY clause in the CREATE TABLE statement, the table that you replace on the destination instance must be defined with the same DATA DIRECTORY clause. A schema mismatch error is reported if the clauses do not match. To determine if the source table was defined with a DATA DIRECTORY clause, use SHOW CREATE TABLE to view the table definition. For information about using the DATA DIRECTORY clause, see Section 14.6.1.2, Creating Tables Externally.
If a ROW_FORMAT option is not defined explicitly in the table definition or ROW_FORMAT=DEFAULT is used, the innodb_default_row_format setting must be the same on the source and destination instances. Otherwise, a schema mismatch error is reported when you attempt the import operation. Use SHOW CREATE TABLE to check the table definition. Use SHOW VARIABLES to check the innodb_default_row_format setting. For related information, see Defining the Row Format of a Table.

先决条件

必须启用 innodb_file_per_table 参数, 当然默认情况下就是开启的。
表空间的页面大小(page size)必须与目标 MySQL实例的页面大小相匹配。 InnoDB 的页面大小由 innodb_page_size 参数指定, 该变量在MySQL实例初始化时进行配置。
如果表存在外键关系, 执行 DISCARD TABLESPACE 前必须先禁用 foreign_key_checks。此外, 在导出时需要保证所有相关的表在逻辑上都处于同一个时间点, 而 ALTER TABLE ... IMPORT TABLESPACE 不对导入的数据强制执行外键约束检查。为此, 需要业务上暂时停止更新相关表, 提交完所有事务, 获取表上的共享锁, 并执行导出操作。
从另一个 MySQL 实例导入表时, 两台 MySQL 服务器软件必须都是GA状态, 并且版本号相同。否则, 只能在同一台 MySQL 服务器实例上导入和创建。
如果是通过 CREATE TABLE 指定 DATA DIRECTORY子句, 在外部目录中创建的表, 那么在目标实例上替换的表也必须使用相同的 DATA DIRECTORY 子句定义。如果子句不匹配, 则会报schema不匹配错误。要确定源表是否使用了 DATA DIRECTORY 子句定义, 可以使用 SHOW CREATE TABLE 查看表定义信息。有关 DATA DIRECTORY 子句的信息, 请参阅第 14.6.1.2 节, 在外部创建表。
如果在表定义中没有明确定义 ROW_FORMAT 选项, 或者使用了 ROW_FORMAT=DEFAULT, 则源实例和目标实例上的 innodb_default_row_format 必须相同。否则, 尝试导入操作时会报 schema 不匹配错误。使用SHOW CREATE TABLE 检查表定义。使用 SHOW VARIABLES 检查 [innodb_default_row_format](https://dev.mysql.com /doc/refman/5.7/en/innodb-parameters.html#sysvar_innodb_default_row_format) 设置。相关信息参见 Defining the Row Format of a Table。

Importing Tables

This example demonstrates how to import a regular non-partitioned table that resides in a file-per-table tablespace.

On the destination instance, create a table with the same definition as the table you intend to import. (You can obtain the table definition using SHOW CREATE TABLE syntax.) If the table definition does not match, a schema mismatch error is reported when you attempt the import operation.

导入表(Importing Tables)

下面的示例演示如何导入驻留在 file-per-table 表空间中的常规非分区表。

1. 在目标实例上, 创建一个与要导入的表具有相同定义的表。

可以先使用 SHOW CREATE TABLE 语法获取表定义。如果表定义不匹配, 则执行导入操作时会报 schema 不匹配错误。

mysql> USE test;
mysql> CREATE TABLE t1 (id INT) ENGINE=INNODB;

On the destination instance, discard the tablespace of the table that you just created. (Before importing, you must discard the tablespace of the receiving table.)

1. 在目标实例上, 丢弃刚刚创建的表的表空间。

导入前必须丢弃接收数据的表自己的表空间。

mysql> ALTER TABLE t1 DISCARD TABLESPACE;

On the source instance, run FLUSH TABLES ... FOR EXPORT to quiesce the table you intend to import. When a table is quiesced, only read-only transactions are permitted on the table.

1. 在源实例上, 运行 [FLUSH TABLES ... FOR EXPORT](https://dev.mysql.com/doc/refman/5.7/en/flush.html#flush-tables-for-export- with-list) 以停止(quiesce)要导入的表。

当一个表被停顿时(quiesced), 只允许执行只读事务。

mysql> USE test;
mysql> FLUSH TABLES t1 FOR EXPORT;

FLUSH TABLES ... FOR EXPORT ensures that changes to the named table have been flushed to disk so that a binary table copy can be made while the server is running. When FLUSH TABLES ... FOR EXPORT is run, InnoDB generates a .cfg metadata file in the schema directory of the table. The .cfg file contains metadata that is used for schema verification during the import operation.

FLUSH TABLES ... FOR EXPORT 确保对该表的更新都已经刷新到磁盘, 以便在服务器运行的同时执行二进制拷贝。执行 FLUSH TABLES ... FOR EXPORT 时, InnoDB 在表的schema目录中生成一个 .cfg 元数据文件。 .cfg 文件包含了导入操作期间用于 schema 验证的元数据。

Copy the .ibd file and .cfg metadata file from the source instance to the destination instance. For example:

1. 拷贝文件

从源实例将.ibd 文件和.cfg 文件复制到目标实例。例如:

shell> scp /path/to/datadir/test/t1.{ibd,cfg} destination-server:/path/to/datadir/test

The .ibd file and .cfg file must be copied before releasing the shared locks, as described in the next step.

Note

If you are importing a table from an encrypted tablespace, InnoDB generates a .cfp file in addition to a .cfg metadata file. The .cfp file must be copied to the destination instance together with the .cfg file. The .cfp file contains a transfer key and an encrypted tablespace key. On import, InnoDB uses the transfer key to decrypt the tablespace key. For related information, see Section 14.14, InnoDB Data-at-Rest Encryption.

在释放共享锁之前, 必须先完成 .ibd 文件和 .cfg 文件的复制, 如下一步所述。

提示

如果从加密表空间导入表, InnoDB 会生成一个 .cfp 文件以及一个 .cfg 元数据文件。 .cfp 文件必须与 .cfg 文件一起复制到目标实例。 .cfp 文件包含传输密钥和表空间加密密钥。在导入时, InnoDB 使用传输密钥来解密表空间加密密钥。相关信息请参阅 Section 14.14, InnoDB Data-at-Rest Encryption.

On the source instance, use UNLOCK TABLES to release the locks acquired by the FLUSH TABLES ... FOR EXPORT statement:

5.在源实例上释放锁

使用UNLOCK TABLES 来释放 FLUSH TABLES ... FOR EXPORT 时加的锁:

mysql> USE test;
mysql> UNLOCK TABLES;

On the destination instance, import the tablespace:

1. 在目标实例上, 导入表空间

mysql> USE test;
mysql> ALTER TABLE t1 IMPORT TABLESPACE;

Importing Partitioned Tables

导入分区表(Importing Partitioned Tables)

因为使用分区表会造成很多麻烦, 国内不建议使用; 本节暂时不翻译;

This example demonstrates how to import a partitioned table, where each table partition resides in a file-per-table tablespace.

On the destination instance, create a partitioned table with the same definition as the partitioned table that you want to import. (You can obtain the table definition using SHOW CREATE TABLE syntax.) If the table definition does not match, a schema mismatch error is reported when you attempt the import operation.

mysql> USE test;
mysql> CREATE TABLE t1 (i int) ENGINE = InnoDB PARTITION BY KEY (i) PARTITIONS 3;

In the /*datadir*/test directory, there is a tablespace .ibd file for each of the three partitions.

mysql> \! ls /path/to/datadir/test/
db.opt  t1.frm  t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd

On the destination instance, discard the tablespace for the partitioned table. (Before the import operation, you must discard the tablespace of the receiving table.)

mysql> ALTER TABLE t1 DISCARD TABLESPACE;

The three tablespace .ibd files of the partitioned table are discarded from the /*datadir*/test directory, leaving the following files:

mysql> \! ls /path/to/datadir/test/
db.opt  t1.frm

On the source instance, run FLUSH TABLES ... FOR EXPORT to quiesce the partitioned table that you intend to import. When a table is quiesced, only read-only transactions are permitted on the table.

mysql> USE test;
mysql> FLUSH TABLES t1 FOR EXPORT;

FLUSH TABLES ... FOR EXPORT ensures that changes to the named table are flushed to disk so that binary table copy can be made while the server is running. When FLUSH TABLES ... FOR EXPORT is run, InnoDB generates .cfg metadata files in the schema directory of the table for each of the table's tablespace files.

mysql> \! ls /path/to/datadir/test/
db.opt t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd
t1.frm  t1#P#p0.cfg  t1#P#p1.cfg  t1#P#p2.cfg

The .cfg files contain metadata that is used for schema verification when importing the tablespace. FLUSH TABLES ... FOR EXPORT can only be run on the table, not on individual table partitions.

Copy the .ibd and .cfg files from the source instance schema directory to the destination instance schema directory. For example:

shell>scp /path/to/datadir/test/t1*.{ibd,cfg} destination-server:/path/to/datadir/test

The .ibd and .cfg files must be copied before releasing the shared locks, as described in the next step.

Note

If you are importing a table from an encrypted tablespace, InnoDB generates a .cfp files in addition to a .cfg metadata files. The .cfp files must be copied to the destination instance together with the .cfg files. The .cfp files contain a transfer key and an encrypted tablespace key. On import, InnoDB uses the transfer key to decrypt the tablespace key. For related information, see Section 14.14, InnoDB Data-at-Rest Encryption.

On the source instance, use UNLOCK TABLES to release the locks acquired by FLUSH TABLES ... FOR EXPORT:

mysql> USE test;
mysql> UNLOCK TABLES;

On the destination instance, import the tablespace of the partitioned table:

mysql> USE test;
mysql> ALTER TABLE t1 IMPORT TABLESPACE;

Importing Table Partitions

导入表分区(Importing Table Partitions)

因为使用分区表会造成很多麻烦, 国内不建议使用; 本节暂时不翻译;

This example demonstrates how to import individual table partitions, where each partition resides in a file-per-table tablespace file.

In the following example, two partitions (p2 and p3) of a four-partition table are imported.

On the destination instance, create a partitioned table with the same definition as the partitioned table that you want to import partitions from. (You can obtain the table definition using SHOW CREATE TABLE syntax.) If the table definition does not match, a schema mismatch error is reported when you attempt the import operation.

mysql> USE test;
mysql> CREATE TABLE t1 (i int) ENGINE = InnoDB PARTITION BY KEY (i) PARTITIONS 4;

In the /*datadir*/test directory, there is a tablespace .ibd file for each of the four partitions.

mysql> \! ls /path/to/datadir/test/
db.opt  t1.frm  t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd t1#P#p3.ibd

On the destination instance, discard the partitions that you intend to import from the source instance. (Before importing partitions, you must discard the corresponding partitions from the receiving partitioned table.)

mysql> ALTER TABLE t1 DISCARD PARTITION p2, p3 TABLESPACE;

The tablespace .ibd files for the two discarded partitions are removed from the /*datadir*/test directory on the destination instance, leaving the following files:

mysql> \! ls /path/to/datadir/test/
db.opt  t1.frm  t1#P#p0.ibd  t1#P#p1.ibd

Note

When ALTER TABLE ... DISCARD PARTITION ... TABLESPACE is run on subpartitioned tables, both partition and subpartition table names are permitted. When a partition name is specified, subpartitions of that partition are included in the operation.

On the source instance, run FLUSH TABLES ... FOR EXPORT to quiesce the partitioned table. When a table is quiesced, only read-only transactions are permitted on the table.

mysql> USE test;
mysql> FLUSH TABLES t1 FOR EXPORT;

FLUSH TABLES ... FOR EXPORT ensures that changes to the named table are flushed to disk so that binary table copy can be made while the instance is running. When FLUSH TABLES ... FOR EXPORT is run, InnoDB generates a .cfg metadata file for each of the table's tablespace files in the schema directory of the table.

mysql> \! ls /path/to/datadir/test/
db.opt  t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd t1#P#p3.ibd
t1.frm  t1#P#p0.cfg  t1#P#p1.cfg  t1#P#p2.cfg t1#P#p3.cfg

The .cfg files contain metadata that used for schema verification during the import operation. FLUSH TABLES ... FOR EXPORT can only be run on the table, not on individual table partitions.

Copy the .ibd and .cfg files for partition p2 and partition p3 from the source instance schema directory to the destination instance schema directory.

shell> scp t1#P#p2.ibd t1#P#p2.cfg t1#P#p3.ibd t1#P#p3.cfg destination-server:/path/to/datadir/test

The .ibd and .cfg files must be copied before releasing the shared locks, as described in the next step.

Note

If you are importing partitions from an encrypted tablespace, InnoDB generates a .cfp files in addition to a .cfg metadata files. The .cfp files must be copied to the destination instance together with the .cfg files. The .cfp files contain a transfer key and an encrypted tablespace key. On import, InnoDB uses the transfer key to decrypt the tablespace key. For related information, see Section 14.14, InnoDB Data-at-Rest Encryption.

On the source instance, use UNLOCK TABLES to release the locks acquired by FLUSH TABLES ... FOR EXPORT:

mysql> USE test;
mysql> UNLOCK TABLES;

On the destination instance, import table partitions p2 and p3:

mysql> USE test;
mysql> ALTER TABLE t1 IMPORT PARTITION p2, p3 TABLESPACE;

Note

When ALTER TABLE ... IMPORT PARTITION ... TABLESPACE is run on subpartitioned tables, both partition and subpartition table names are permitted. When a partition name is specified, subpartitions of that partition are included in the operation.

Limitations

The Transportable Tablespaces feature is only supported for tables that reside in file-per-table tablespaces. It is not supported for the tables that reside in the system tablespace or general tablespaces. Tables in shared tablespaces cannot be quiesced.
FLUSH TABLES ... FOR EXPORT is not supported on tables with a FULLTEXT index, as full-text search auxiliary tables cannot be flushed. After importing a table with a FULLTEXT index, run OPTIMIZE TABLE to rebuild the FULLTEXT indexes. Alternatively, drop FULLTEXT indexes before the export operation and recreate the indexes after importing the table on the destination instance.
Due to a .cfg metadata file limitation, schema mismatches are not reported for partition type or partition definition differences when importing a partitioned table. Column differences are reported.

限制

Transportable Tablespaces 功能仅支持保存在 file-per-table 表空间中的表。系统表空间和通用表空间中的表不支持。而共享表空间中的表不能停顿。
FLUSH TABLES ... FOR EXPORT 在具有 FULLTEXT 索引的表上不受支持, 因为全文检索辅助表无法刷新。导入带有FULLTEXT索引的表后, 需要运行 OPTIMIZE TABLE 以重建FULLTEXT索引. 或者, 在导出操作执行前先删除 FULLTEXT 索引, 然后在目标实例上导入表, 完成后重建索引。
由于 .cfg 元数据文件的限制, 导入分区表时如果分区类型或者分区定义不同, 并不会报 schema 不匹配错误。而是会报告列差异。

Usage Notes

ALTER TABLE ... IMPORT TABLESPACE does not require a .cfg metadata file to import a table. However, metadata checks are not performed when importing without a .cfg file, and a warning similar to the following is issued:

使用说明

ALTER TABLE ... IMPORT TABLESPACE 导入表时并不需要 .cfg 元数据文件。但是, 在导入时如果没有 .cfg 文件的情况下, 不会执行元数据检查, 并且会发出类似于下面这样的警告:

Message: InnoDB: IO Read error: (2, No such file or directory) Error opening '.\
test\t.cfg', will attempt to import without schema verification
1 row in set (0.00 sec)

Importing a table without a .cfg metadata file should only be considered if no schema mismatches are expected. The ability to import without a .cfg file could be useful in crash recovery scenarios where metadata is not accessible.

On Windows, InnoDB stores database, tablespace, and table names internally in lowercase. To avoid import problems on case-sensitive operating systems such as Linux and Unix, create all databases, tablespaces, and tables using lowercase names. A convenient way to accomplish this is to add lower_case_table_names=1 to the [mysqld] section of your my.cnf or my.ini file before creating databases, tablespaces, or tables:

只有在预期没有 schema不匹配的情况下才应考虑导入没有 .cfg 元数据文件的表。比如在无法访问元数据的崩溃恢复场景中, 无需 .cfg 文件即可导入的功能会非常有用。

在 Windows 系统上, InnoDB 内部以小写格式存储数据库名、表空间名, 以及表名。为避免在区分大小写的操作系统(如 Linux 和 Unix)上出现跨系统导入问题, 请使用小写名称创建所有数据库、表空间和表。完成此操作的一种便捷方法是在 my.cnf 或 my.ini 文件中, lower_case_table_names=1 添加到 [mysqld] 节。然后再创建数据库、表空间或表:

[mysqld]
lower_case_table_names=1

When running ALTER TABLE ... DISCARD PARTITION ... TABLESPACE and ALTER TABLE ... IMPORT PARTITION ... TABLESPACE on subpartitioned tables, both partition and subpartition table names are permitted. When a partition name is specified, subpartitions of that partition are included in the operation.
在子分区表上运行 ALTER TABLE ... DISCARD PARTITION ... TABLESPACE 和 ALTER TABLE ... IMPORT PARTITION ... TABLESPACE 时, 指定分区和子分区表的名称都可以。如果指定分区名, 该分区的子分区将包含在操作中。

Internals

The following information describes internals and messages written to the error log during a table import procedure.

When ALTER TABLE ... DISCARD TABLESPACE is run on the destination instance:

The table is locked in X mode.
The tablespace is detached from the table.

内部逻辑

以下信息描述了在导入表的过程中的内部逻辑, 以及写入错误日志的消息。

在目标实例上运行 ALTER TABLE ... DISCARD TABLESPACE 时:

会以 X 模式锁定对应的table。
表空间与表会被分离。

When FLUSH TABLES ... FOR EXPORT is run on the source instance:

The table being flushed for export is locked in shared mode.
The purge coordinator thread is stopped.
Dirty pages are synchronized to disk.
Table metadata is written to the binary .cfg file.

在源实例上执行 FLUSH TABLES ... FOR EXPORT 时:

flushed for export 的表在共享模式下被锁定。
purge 协调器线程会被停止。
脏页会被同步到磁盘。
表元数据被写入到二进制 .cfg 文件中。

Expected error log messages for this operation:

此操作的预期error log消息为:

[Note] InnoDB: Sync to disk of '"test"."t1"' started.
[Note] InnoDB: Stopping purge
[Note] InnoDB: Writing table metadata to './test/t1.cfg'
[Note] InnoDB: Table '"test"."t1"' flushed to disk

When UNLOCK TABLES is run on the source instance:

The binary .cfg file is deleted.
The shared lock on the table or tables being imported is released and the purge coordinator thread is restarted.

在源实例上执行 UNLOCK TABLES 时:

二进制 .cfg 文件会被删除。
要导入的一个或多个表上的共享锁会被释放, 并重新启动清除协调器线程(purge coordinator thread)。

Expected error log messages for this operation:

此操作的预期error log消息为:

[Note] InnoDB: Deleting the meta-data file './test/t1.cfg'
[Note] InnoDB: Resuming purge

When ALTER TABLE ... IMPORT TABLESPACE is run on the destination instance, the import algorithm performs the following operations for each tablespace being imported:

Each tablespace page is checked for corruption.
The space ID and log sequence numbers (LSNs) on each page are updated.
Flags are validated and LSN updated for the header page.
Btree pages are updated.
The page state is set to dirty so that it is written to disk.

在目标实例上运行 ALTER TABLE ... IMPORT TABLESPACE 时, 导入算法对涉及的每个表空间执行以下操作:

检查每个表空间页是否损坏。
更新每页上的 space ID 和日志序列号 (LSN, log sequence numbers)。
为header页验证标志并更新 LSN。
更新 Btree 页面。
将页面状态设置为dirty, 以便将其写入磁盘。

Expected error log messages for this operation:

此操作的预期error log消息为:

[Note] InnoDB: Importing tablespace for table 'test/t1' that was exported
from host 'host_name'
[Note] InnoDB: Phase I - Update all pages
[Note] InnoDB: Sync to disk
[Note] InnoDB: Sync to disk - done!
[Note] InnoDB: Phase III - Flush changes to disk
[Note] InnoDB: Phase IV - Flush complete

Note

You may also receive a warning that a tablespace is discarded (if you discarded the tablespace for the destination table) and a message stating that statistics could not be calculated due to a missing .ibd file:

备注

您可能还会收到一条表空间被丢弃的警告(如果丢弃了目标表的表空间), 以及一条指出由于缺少 .ibd 文件而无法计算统计信息的消息:

[Warning] InnoDB: Table "test"."t1" tablespace is set as discarded.
7f34d9a37700 InnoDB: cannot calculate statistics for table
"test"."t1" because the .ibd file is missing. For help, please refer to
http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html

14.6.1.4 Moving or Copying InnoDB Tables

14.6.1.4 迁移或者复制 InnoDB 表

This section describes techniques for moving or copying some or all InnoDB tables to a different server or instance. For example, you might move an entire MySQL instance to a larger, faster server; you might clone an entire MySQL instance to a new replica server; you might copy individual tables to another instance to develop and test an application, or to a data warehouse server to produce reports.

On Windows, InnoDB always stores database and table names internally in lowercase. To move databases in a binary format from Unix to Windows or from Windows to Unix, create all databases and tables using lowercase names. A convenient way to accomplish this is to add the following line to the [mysqld] section of your my.cnf or my.ini file before creating any databases or tables:

本节描述了将部分/全部 InnoDB表迁移/复制到其他服务器/实例的技术。例如: 可能将整个 MySQL 实例迁移到配置更高、速度更快的服务器上；可以将整个 MySQL 实例克隆到新的服务器副本；可以将单个表复制到另一个实例以开发和测试应用程序, 或复制到数据仓库服务器来分析报告。

在 Windows 上, InnoDB 在内部总是以小写形式存储数据库名和表名。要将二进制格式的数据库在 Unix/Linux 和 Windows 之间迁移/复制, 请使用小写名称创建所有数据库和表。一种方便的方法是在创建任何数据库或表之前, 将以下配置添加到 my.cnf 或者 my.ini 文件的 [mysqld] 节:

[mysqld]
lower_case_table_names=1

Techniques for moving or copying InnoDB tables include:

迁移或复制 InnoDB 表的技术包括:

Importing Tables

A table that resides in a file-per-table tablespace can be imported from another MySQL server instance or from a backup using the Transportable Tablespace feature. See Section 14.6.1.3, Importing InnoDB Tables.

1. 导入表

可以使用 Transportable Tablespace 功能, 从另一个 MySQL实例或者备份, 导入 file-per-table 表空间的表。

参考前面的 14.6.1.3 导入 InnoDB 表。

MySQL Enterprise Backup

The MySQL Enterprise Backup product lets you back up a running MySQL database with minimal disruption to operations while producing a consistent snapshot of the database. When MySQL Enterprise Backup is copying tables, reads and writes can continue. In addition, MySQL Enterprise Backup can create compressed backup files, and back up subsets of tables. In conjunction with the MySQL binary log, you can perform point-in-time recovery. MySQL Enterprise Backup is included as part of the MySQL Enterprise subscription.

For more details about MySQL Enterprise Backup, see Section 27.2, MySQL Enterprise Backup Overview.

2. MySQL企业备份

MySQL企业备份(MySQL Enterprise Backup) 产品可让您备份正在运行的 MySQL 数据库, 同时将操作中断降至最低, 同时生成数据库的一致快照。当 MySQL Enterprise Backup 正在复制表时, 读取和写入可以正常执行。此外, MySQL Enterprise Backup 可以创建压缩备份文件, 并备份表的子集。结合 MySQL binary log, 我们可以按时间点进行恢复。 MySQL Enterprise Backup 包含在 MySQL Enterprise 授权中。

有关 MySQL 企业备份的更多详细信息, 请参阅 Section 27.2, MySQL Enterprise Backup Overview。

Copying Data Files (Cold Backup Method)

You can move an InnoDB database simply by copying all the relevant files listed under "Cold Backups" in Section 14.19.1, InnoDB Backup.

InnoDB data and log files are binary-compatible on all platforms having the same floating-point number format. If the floating-point formats differ but you have not used FLOAT or DOUBLE data types in your tables, then the procedure is the same: simply copy the relevant files.

When you move or copy file-per-table .ibd files, the database directory name must be the same on the source and destination systems. The table definition stored in the InnoDB shared tablespace includes the database name. The transaction IDs and log sequence numbers stored in the tablespace files also differ between databases.

To move an .ibd file and the associated table from one database to another, use a RENAME TABLE statement:

3. 拷贝数据文件(冷备份方法)

可以将 Section 14.19.1, InnoDB Backup 一节中, "Cold Backups" 部分列出的所有相关文件拷贝出来。

InnoDB 数据和日志文件在所有具有相同浮点数格式的平台上都是二进制兼容的。如果浮点格式不同, 但没有使用到 FLOAT 和 DOUBLE 类型, 那么也是兼容的: 只需复制相关文件。

当您移动或复制 file-per-table 的 .ibd 文件时, 源系统和目标系统上的数据库目录名称必须相同。存储在 InnoDB 共享表空间中的表定义包括数据库名称。存储在表空间文件中的事务 ID 和日志序列号也因数据库而异。

要将 .ibd 文件和关联表从一个数据库迁移到另一个数据库, 请使用 RENAME TABLE 语句:

RENAME TABLE db1.tbl_name TO db2.tbl_name;

If you have a clean backup of an .ibd file, you can restore it to the MySQL installation from which it originated as follows:

The table must not have been dropped or truncated since you copied the .ibd file, because doing so changes the table ID stored inside the tablespace.
Issue this ALTER TABLE statement to delete the current .ibd file:

如果有一个干净的 .ibd 文件备份, 可以通过以下步骤将它恢复到生成该文件的 MySQL 实例上:

在复制 .ibd 文件后, 该表不得被删除或截断(dropped or truncated), 因为这样做会更改存储在表空间内的 table ID。
执行下面的 ALTER TABLE 语句来删除当前的 .ibd 文件:

ALTER TABLE tbl_name DISCARD TABLESPACE;

Copy the backup .ibd file to the proper database directory.
Issue this ALTER TABLE statement to tell InnoDB to use the new .ibd file for the table:
将备份的.ibd 文件复制到正确的数据库目录。
执行下面的 ALTER TABLE 语句, 告诉 InnoDB 对这个表使用新的 .ibd 文件:

ALTER TABLE tbl_name IMPORT TABLESPACE;

Note

The ALTER TABLE ... IMPORT TABLESPACE feature does not enforce foreign key constraints on imported data.

提示:

ALTER TABLE ... IMPORT TABLESPACE 功能不会对导入的数据强制执行外键约束。

In this context, a clean .ibd file backup is one for which the following requirements are satisfied:

There are no uncommitted modifications by transactions in the .ibd file.
There are no unmerged insert buffer entries in the .ibd file.
Purge has removed all delete-marked index records from the .ibd file.
mysqld has flushed all modified pages of the .ibd file from the buffer pool to the file.

在这种情况下, 干净的 .ibd 文件备份满足以下要求:

.ibd 文件中没有未提交的修改。
.ibd 文件中没有未合并的insert缓冲区条目。
已从.ibd 文件中清除掉所有删除标记的索引记录。
mysqld 已将.ibd文件的所有修改页面从缓冲池刷新到文件中.

You can make a clean backup .ibd file using the following method:

Stop all activity from the mysqld server and commit all transactions.
Wait until SHOW ENGINE INNODB STATUS shows that there are no active transactions in the database, and the main thread status of InnoDB is Waiting for server activity. Then you can make a copy of the .ibd file.

可以使用以下方法来制作干净的 .ibd 备份文件:

停止 mysqld 服务器的所有活动, 提交所有事务。
等到 SHOW ENGINE INNODB STATUS 显示数据库中没有活动事务, 并且 InnoDB 的主线程状态是 Waiting for server activity。然后就可以复制.ibd文件了。

Another method for making a clean copy of an .ibd file is to use the MySQL Enterprise Backup product:

Use MySQL Enterprise Backup to back up the InnoDB installation.
Start a second mysqld server on the backup and let it clean up the .ibd files in the backup.

另一种制作 .ibd 文件干净副本的方法, 是使用 MySQL Enterprise Backup 产品:

使用 MySQL Enterprise Backup 备份 InnoDB 安装。
在备份机上启动第二个 mysqld 服务器, 让它清理.ibd文件。

Restoring from a Logical Backup

You can use a utility such as mysqldump to perform a logical backup, which produces a set of SQL statements that can be executed to reproduce the original database object definitions and table data for transfer to another SQL server. Using this method, it does not matter whether the formats differ or if your tables contain floating-point data.

To improve the performance of this method, disable autocommit when importing data. Perform a commit only after importing an entire table or segment of a table.

4. 从逻辑备份恢复

可以使用诸如 mysqldump 之类的工具程序来执行逻辑备份, 这会生成一组 SQL 语句, 可以用来复制原始数据库对象定义和表数据以传输到另一台MYSQL服务器。使用这种方法, 格式不同或者是否包含浮点数都无关紧要。

为了提高这种方法的性能, 在导入数据时需要禁用autocommit。仅在导入整个表或表的一部分后才执行提交。

14.6.1.5 Converting Tables from MyISAM to InnoDB

14.6.1.5 将 MyISAM 表转换为 InnoDB 引擎

说明: 此部分内容使用场景较少, 暂时不翻译;

If you have MyISAM tables that you want to convert to InnoDB for better reliability and scalability, review the following guidelines and tips before converting.

Adjusting Memory Usage for MyISAM and InnoDB

As you transition away from MyISAM tables, lower the value of the key_buffer_size configuration option to free memory no longer needed for caching results. Increase the value of the innodb_buffer_pool_size configuration option, which performs a similar role of allocating cache memory for InnoDB tables. The InnoDB buffer pool caches both table data and index data, speeding up lookups for queries and keeping query results in memory for reuse. For guidance regarding buffer pool size configuration, see Section 8.12.4.1, How MySQL Uses Memory.

On a busy server, run benchmarks with the query cache turned off. The InnoDB buffer pool provides similar benefits, so the query cache might be tying up memory unnecessarily. For information about the query cache, see Section 8.10.3, The MySQL Query Cache.

Handling Too-Long Or Too-Short Transactions

Because MyISAM tables do not support transactions, you might not have paid much attention to the autocommit configuration option and the COMMIT and ROLLBACK statements. These keywords are important to allow multiple sessions to read and write InnoDB tables concurrently, providing substantial scalability benefits in write-heavy workloads.

While a transaction is open, the system keeps a snapshot of the data as seen at the beginning of the transaction, which can cause substantial overhead if the system inserts, updates, and deletes millions of rows while a stray transaction keeps running. Thus, take care to avoid transactions that run for too long:

If you are using a mysql session for interactive experiments, always COMMIT (to finalize the changes) or ROLLBACK (to undo the changes) when finished. Close down interactive sessions rather than leave them open for long periods, to avoid keeping transactions open for long periods by accident.
Make sure that any error handlers in your application also ROLLBACK incomplete changes or COMMIT completed changes.
ROLLBACK is a relatively expensive operation, because INSERT, UPDATE, and DELETE operations are written to InnoDB tables prior to the COMMIT, with the expectation that most changes are committed successfully and rollbacks are rare. When experimenting with large volumes of data, avoid making changes to large numbers of rows and then rolling back those changes.
When loading large volumes of data with a sequence of INSERT statements, periodically COMMIT the results to avoid having transactions that last for hours. In typical load operations for data warehousing, if something goes wrong, you truncate the table (using TRUNCATE TABLE) and start over from the beginning rather than doing a ROLLBACK.

The preceding tips save memory and disk space that can be wasted during too-long transactions. When transactions are shorter than they should be, the problem is excessive I/O. With each COMMIT, MySQL makes sure each change is safely recorded to disk, which involves some I/O.

For most operations on InnoDB tables, you should use the setting autocommit=0. From an efficiency perspective, this avoids unnecessary I/O when you issue large numbers of consecutive INSERT, UPDATE, or DELETE statements. From a safety perspective, this allows you to issue a ROLLBACK statement to recover lost or garbled data if you make a mistake on the mysql command line, or in an exception handler in your application.
The time when autocommit=1 is suitable for InnoDB tables is when running a sequence of queries for generating reports or analyzing statistics. In this situation, there is no I/O penalty related to COMMIT or ROLLBACK, and InnoDB can automatically optimize the read-only workload.
If you make a series of related changes, finalize all the changes at once with a single COMMIT at the end. For example, if you insert related pieces of information into several tables, do a single COMMIT after making all the changes. Or if you run many consecutive INSERT statements, do a single COMMIT after all the data is loaded; if you are doing millions of INSERT statements, perhaps split up the huge transaction by issuing a COMMIT every ten thousand or hundred thousand records, so the transaction does not grow too large.
Remember that even a SELECT statement opens a transaction, so after running some report or debugging queries in an interactive mysql session, either issue a COMMIT or close the mysql session.

Handling Deadlocks

You might see warning messages referring to deadlocks in the MySQL error log, or the output of SHOW ENGINE INNODB STATUS. Despite the scary-sounding name, a deadlock is not a serious issue for InnoDB tables, and often does not require any corrective action. When two transactions start modifying multiple tables, accessing the tables in a different order, they can reach a state where each transaction is waiting for the other and neither can proceed. When deadlock detection is enabled (the default), MySQL immediately detects this condition and cancels (rolls back) the smaller transaction, allowing the other to proceed. If deadlock detection is disabled using the innodb_deadlock_detect configuration option, InnoDB relies on the innodb_lock_wait_timeout setting to roll back transactions in case of a deadlock.

Either way, your applications need error-handling logic to restart a transaction that is forcibly cancelled due to a deadlock. When you re-issue the same SQL statements as before, the original timing issue no longer applies. Either the other transaction has already finished and yours can proceed, or the other transaction is still in progress and your transaction waits until it finishes.

If deadlock warnings occur constantly, you might review the application code to reorder the SQL operations in a consistent way, or to shorten the transactions. You can test with the innodb_print_all_deadlocks option enabled to see all deadlock warnings in the MySQL error log, rather than only the last warning in the SHOW ENGINE INNODB STATUS output.

For more information, see Section 14.7.5, Deadlocks in InnoDB.

Planning the Storage Layout

To get the best performance from InnoDB tables, you can adjust a number of parameters related to storage layout.

When you convert MyISAM tables that are large, frequently accessed, and hold vital data, investigate and consider the innodb_file_per_table, innodb_file_format, and innodb_page_size configuration options, and the ROW_FORMAT and KEY_BLOCK_SIZE clauses of the CREATE TABLE statement.

During your initial experiments, the most important setting is innodb_file_per_table. When this setting is enabled, which is the default as of MySQL 5.6.6, new InnoDB tables are implicitly created in file-per-table tablespaces. In contrast with the InnoDB system tablespace, file-per-table tablespaces allow disk space to be reclaimed by the operating system when a table is truncated or dropped. File-per-table tablespaces also support the Barracuda file format and associated features such as table compression, efficient off-page storage for long variable-length columns, and large index prefixes. For more information, see Section 14.6.3.2, File-Per-Table Tablespaces.

You can also store InnoDB tables in a shared general tablespace. General tablespaces support the Barracuda file format and can contain multiple tables. For more information, see Section 14.6.3.3, General Tablespaces.

Converting an Existing Table

To convert a non-InnoDB table to use InnoDB use ALTER TABLE:

ALTER TABLE table_name ENGINE=InnoDB;

Warning

Do not convert MySQL system tables in the mysql database from MyISAM to InnoDB tables. This is an unsupported operation. If you do this, MySQL does not restart until you restore the old system tables from a backup or regenerate them by reinitializing the data directory (see Section 2.10.1, Initializing the Data Directory).

Cloning the Structure of a Table

You might make an InnoDB table that is a clone of a MyISAM table, rather than using ALTER TABLE to perform conversion, to test the old and new table side-by-side before switching.

Create an empty InnoDB table with identical column and index definitions. Use SHOW CREATE TABLE *table_name*\G to see the full CREATE TABLE statement to use. Change the ENGINE clause to ENGINE=INNODB.

Transferring Existing Data

To transfer a large volume of data into an empty InnoDB table created as shown in the previous section, insert the rows with INSERT INTO *innodb_table* SELECT * FROM *myisam_table* ORDER BY *primary_key_columns*.

You can also create the indexes for the InnoDB table after inserting the data. Historically, creating new secondary indexes was a slow operation for InnoDB, but now you can create the indexes after the data is loaded with relatively little overhead from the index creation step.

If you have UNIQUE constraints on secondary keys, you can speed up a table import by turning off the uniqueness checks temporarily during the import operation:

SET unique_checks=0;
... import operation ...
SET unique_checks=1;

For big tables, this saves disk I/O because InnoDB can use its change buffer to write secondary index records as a batch. Be certain that the data contains no duplicate keys. unique_checks permits but does not require storage engines to ignore duplicate keys.

For better control over the insertion process, you can insert big tables in pieces:

INSERT INTO newtable SELECT * FROM oldtable
   WHERE yourkey > something AND yourkey <= somethingelse;

After all records are inserted, you can rename the tables.

During the conversion of big tables, increase the size of the InnoDB buffer pool to reduce disk I/O, to a maximum of 80% of physical memory. You can also increase the size of InnoDB log files.

Storage Requirements

If you intend to make several temporary copies of your data in InnoDB tables during the conversion process, it is recommended that you create the tables in file-per-table tablespaces so that you can reclaim the disk space when you drop the tables. When the innodb_file_per_table configuration option is enabled (the default), newly created InnoDB tables are implicitly created in file-per-table tablespaces.

Whether you convert the MyISAM table directly or create a cloned InnoDB table, make sure that you have sufficient disk space to hold both the old and new tables during the process. InnoDB tables require more disk space than MyISAM tables. If an ALTER TABLE operation runs out of space, it starts a rollback, and that can take hours if it is disk-bound. For inserts, InnoDB uses the insert buffer to merge secondary index records to indexes in batches. That saves a lot of disk I/O. For rollback, no such mechanism is used, and the rollback can take 30 times longer than the insertion.

In the case of a runaway rollback, if you do not have valuable data in your database, it may be advisable to kill the database process rather than wait for millions of disk I/O operations to complete. For the complete procedure, see Section 14.22.2, Forcing InnoDB Recovery.

Defining a PRIMARY KEY for Each Table

The PRIMARY KEY clause is a critical factor affecting the performance of MySQL queries and the space usage for tables and indexes. The primary key uniquely identifies a row in a table. Every row in the table must have a primary key value, and no two rows can have the same primary key value.

These are guidelines for the primary key, followed by more detailed explanations.

Declare a PRIMARY KEY for each table. Typically, it is the most important column that you refer to in WHERE clauses when looking up a single row.
Declare the PRIMARY KEY clause in the original CREATE TABLE statement, rather than adding it later through an ALTER TABLE statement.
Choose the column and its data type carefully. Prefer numeric columns over character or string ones.
Consider using an auto-increment column if there is not another stable, unique, non-null, numeric column to use.
An auto-increment column is also a good choice if there is any doubt whether the value of the primary key column could ever change. Changing the value of a primary key column is an expensive operation, possibly involving rearranging data within the table and within each secondary index.

Consider adding a primary key to any table that does not already have one. Use the smallest practical numeric type based on the maximum projected size of the table. This can make each row slightly more compact, which can yield substantial space savings for large tables. The space savings are multiplied if the table has any secondary indexes, because the primary key value is repeated in each secondary index entry. In addition to reducing data size on disk, a small primary key also lets more data fit into the buffer pool, speeding up all kinds of operations and improving concurrency.

If the table already has a primary key on some longer column, such as a VARCHAR, consider adding a new unsigned AUTO_INCREMENT column and switching the primary key to that, even if that column is not referenced in queries. This design change can produce substantial space savings in the secondary indexes. You can designate the former primary key columns as UNIQUE NOT NULL to enforce the same constraints as the PRIMARY KEY clause, that is, to prevent duplicate or null values across all those columns.

If you spread related information across multiple tables, typically each table uses the same column for its primary key. For example, a personnel database might have several tables, each with a primary key of employee number. A sales database might have some tables with a primary key of customer number, and other tables with a primary key of order number. Because lookups using the primary key are very fast, you can construct efficient join queries for such tables.

If you leave the PRIMARY KEY clause out entirely, MySQL creates an invisible one for you. It is a 6-byte value that might be longer than you need, thus wasting space. Because it is hidden, you cannot refer to it in queries.

Application Performance Considerations

The reliability and scalability features of InnoDB require more disk storage than equivalent MyISAM tables. You might change the column and index definitions slightly, for better space utilization, reduced I/O and memory consumption when processing result sets, and better query optimization plans making efficient use of index lookups.

If you do set up a numeric ID column for the primary key, use that value to cross-reference with related values in any other tables, particularly for join queries. For example, rather than accepting a country name as input and doing queries searching for the same name, do one lookup to determine the country ID, then do other queries (or a single join query) to look up relevant information across several tables. Rather than storing a customer or catalog item number as a string of digits, potentially using up several bytes, convert it to a numeric ID for storing and querying. A 4-byte unsigned INT column can index over 4 billion items (with the US meaning of billion: 1000 million). For the ranges of the different integer types, see Section 11.1.2, Integer Types (Exact Value) - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT.

Understanding Files Associated with InnoDB Tables

InnoDB files require more care and planning than MyISAM files do.

You must not delete the ibdata files that represent the InnoDB system tablespace.
Methods of moving or copying InnoDB tables to a different server are described in Section 14.6.1.4, Moving or Copying InnoDB Tables.

14.6.1.6 AUTO_INCREMENT Handling in InnoDB

14.6.1.6 InnoDB 中 AUTO_INCREMENT 的处理机制

InnoDB provides a configurable locking mechanism that can significantly improve scalability and performance of SQL statements that add rows to tables with AUTO_INCREMENT columns. To use the AUTO_INCREMENT mechanism with an InnoDB table, an AUTO_INCREMENT column must be defined as part of an index such that it is possible to perform the equivalent of an indexed SELECT MAX(auto_incr_col) lookup on the table to obtain the maximum column value. Typically, this is achieved by making the column the first column of some table index.

This section describes the behavior of AUTO_INCREMENT lock modes, usage implications for different AUTO_INCREMENT lock mode settings, and how InnoDB initializes the AUTO_INCREMENT counter.

InnoDB 提供了一种可配置的锁机制, 在具有 AUTO_INCREMENT 列的表中新增记录时, 可以显著提高 SQL 语句的可扩展性与执行效率。要在InnoDB表中使用 AUTO_INCREMENT 机制, 必须将 AUTO_INCREMENT 列指定为索引的一部分, 以便执行等价于 SELECT MAX(auto_incr_col) 的索引查询语句, 在表中查找该列的最大值。通常, 这是通过将该列作为某个索引的第一列来实现的。

本节的内容包括: AUTO_INCREMENT 锁定模式的行为、使用不同 AUTO_INCREMENT 锁定模式设置的含义、以及InnoDB 如何初始化 AUTO_INCREMENT 计数器。

InnoDB AUTO_INCREMENT Lock Modes

This section describes the behavior of AUTO_INCREMENT lock modes used to generate auto-increment values, and how each lock mode affects replication. Auto-increment lock modes are configured at startup using the innodb_autoinc_lock_mode configuration parameter.

The following terms are used in describing innodb_autoinc_lock_mode settings:

InnoDB 中 AUTO_INCREMENT 的锁定模式

AUTO_INCREMENT 用于生成自增值, 下面介绍它的锁定模式, 以及每种锁定模式如何影响主从复制。通过 innodb_autoinc_lock_mode 配置参数在启动时设置自增量的锁定模式。

下面是与 innodb_autoinc_lock_mode 设置相关的术语:

INSERT-like statements

All statements that generate new rows in a table, including INSERT, INSERT ... SELECT, REPLACE, REPLACE ... SELECT, and LOAD DATA. Includes simple-inserts, bulk-inserts, and mixed-mode inserts.

类INSERT语句

所有在表中生成新行的语句, 比如 INSERT, INSERT ... SELECT, REPLACE, REPLACE ... SELECT, 以及 LOAD DATA。包括 simple-inserts, bulk-inserts, 以及 mixed-mode inserts。

Simple inserts

Statements for which the number of rows to be inserted can be determined in advance (when the statement is initially processed). This includes single-row and multiple-row INSERT and REPLACE statements that do not have a nested subquery, but not INSERT ... ON DUPLICATE KEY UPDATE.

简单插入(Simple inserts)

在最初解析SQL语句时就可以确定要插入多少行的语句, 称为简单插入。包括单行和多行 INSERT, 以及 REPLACE 语句, 但不包含嵌套子查询。不包括 INSERT ... ON DUPLICATE KEY UPDATE 语句。

Bulk inserts

Statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. This includes INSERT ... SELECT, REPLACE ... SELECT, and LOAD DATA statements, but not plain INSERT. InnoDB assigns new values for the AUTO_INCREMENT column one at a time as each row is processed.

批量插入(Bulk inserts)

事先不知道要插入具体行数的语句(需要自增值的数量也是未知的)。包括 INSERT ... SELECT, REPLACE ... SELECT, 以及 LOAD DATA 语句, 但排除掉简单的INSERT。 InnoDB 在处理每一行时为 AUTO_INCREMENT 列分配一个新值。

Mixed-mode inserts

These are simple insert statements that specify the auto-increment value for some (but not all) of the new rows. An example follows, where id is an AUTO_INCREMENT column of table t1:

混合模式插入(Mixed-mode inserts)

是简单插入语句, 但只为某些新行(而不是全部)指定自增值。比如下面的这个示例, 其中 id 是一个 AUTO_INCREMENT 列:

INSERT INTO t1 (id,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');

Another type of mixed-mode insert is INSERT ... ON DUPLICATE KEY UPDATE, which in the worst case is in effect an INSERT followed by a UPDATE, where the allocated value for the AUTO_INCREMENT column may or may not be used during the update phase.

另一种类型的 mixed-mode insert 是 INSERT ... ON DUPLICATE KEY UPDATE, 在最坏情况下, 会实际执行一次 INSERT, 然后再跟着执行一次 UPDATE 语句, 所以在更新阶段可能会也可能不会使用为 AUTO_INCREMENT 列分配的值。

There are three possible settings for the innodb_autoinc_lock_mode configuration parameter. The settings are 0, 1, or 2, for traditional, consecutive, or interleaved lock mode, respectively.

innodb_autoinc_lock_mode 可以配置三种值: 0, 1, 2, 分别表示 traditional, consecutive, interleaved 锁定模式。

innodb_autoinc_lock_mode = 0 (traditional lock mode)

The traditional lock mode provides the same behavior that existed before the innodb_autoinc_lock_mode configuration parameter was introduced in MySQL 5.1. The traditional lock mode option is provided for backward compatibility, performance testing, and working around issues with mixed-mode inserts, due to possible differences in semantics.

In this lock mode, all INSERT-like statements obtain a special table-level AUTO-INC lock for inserts into tables with AUTO_INCREMENT columns. This lock is normally held to the end of the statement (not to the end of the transaction) to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence of INSERT statements, and to ensure that auto-increment values assigned by any given statement are consecutive.

In the case of statement-based replication, this means that when an SQL statement is replicated on a replica server, the same values are used for the auto-increment column as on the source server. The result of execution of multiple INSERT statements is deterministic, and the replica reproduces the same data as on the source. If auto-increment values generated by multiple INSERT statements were interleaved, the result of two concurrent INSERT statements would be nondeterministic, and could not reliably be propagated to a replica server using statement-based replication.

To make this clear, consider an example that uses this table:

innodb_autoinc_lock_mode = 0 传统锁定模式(traditional lock mode)

传统锁定模式提供了与 MySQL5.1 引入 innodb_autoinc_lock_mode 配置参数之前存在的相同行为;
提供传统锁定模式的目的, 是为了向后兼容、性能测试和解决混合模式插入的问题, 因为可能存在语义上的差异。

在这种锁定模式下, 所有 INSERT-like 语句都会获取一个特殊的表级 AUTO-INC 锁, 用于插入到带有AUTO_INCREMENT列的表中。这个锁通常持有到语句末尾(而不是事务末尾), 以确保为给定的 INSERT 语句, 以可预测和可重复的顺序分配自增值, 并确保任何给定语句分配的自增值是连续的。

在基于语句复制(statement-based replication)的情况下, 这意味着在副本服务器上复制 SQL 语句时, 自增列使用的值与源服务器上的值相同。多个 INSERT 语句的执行结果是确定的, 并且副本会生成与源数据库相同的数据。如果多个 INSERT 语句生成的自增值存在交错, 则并发执行两个 INSERT 语句的结果将是不确定的, 那样的话就不能使用基于语句的复制来将数据可靠地传播到副本服务器。

为了说明这一点, 请看下面这个表:

CREATE TABLE t1 (
  id INT(11) NOT NULL AUTO_INCREMENT,
  c2 VARCHAR(10) DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

Suppose that there are two transactions running, each inserting rows into a table with an AUTO_INCREMENT column. One transaction is using an INSERT ... SELECT statement that inserts 1000 rows, and another is using a simple INSERT statement that inserts one row:

假设有两个事务在运行, 每个事务都将在一个带有 AUTO_INCREMENT 列的表中插入行。一个事务Tx1 通过 INSERT ... SELECT 语句插入 1000 行, 另一个事务Tx2则使用简单 INSERT 语句插入一行:

Tx1: INSERT INTO t1 (c2) SELECT ... 1000 rows from another table ...
Tx2: INSERT INTO t1 (c2) VALUES ('xxx');

InnoDB cannot tell in advance how many rows are retrieved from the SELECT in the INSERT statement in Tx1, and it assigns the auto-increment values one at a time as the statement proceeds. With a table-level lock, held to the end of the statement, only one INSERT statement referring to table t1 can execute at a time, and the generation of auto-increment numbers by different statements is not interleaved. The auto-increment value generated by the Tx1 INSERT ... SELECT statement is consecutive, and the (single) auto-increment value used by the INSERT statement in Tx2 is either be smaller or larger than all those used for Tx1, depending on which statement executes first.

InnoDB 无法预先知道 Tx1 到底会从 SELECT 语句中获取到多少行, 在执行过程中会随着语句的进行, 每次分配一个自增值。通过表级锁, 一直持有到语句末尾, 则同一时间, 表t1上只会有一条 INSERT 语句执行, 不同语句生成的自增数不会发生交错。事务 Tx1的INSERT ... SELECT 语句生成的自增值是连续的, 事务Tx2 中的 INSERT 语句使用的单个自增值, 要么都小于Tx1使用的自增值, 要么就全部大于Tx1的自增值, 这取决于哪个语句先执行。

As long as the SQL statements execute in the same order when replayed from the binary log (when using statement-based replication, or in recovery scenarios), the results are the same as they were when Tx1 and Tx2 first ran. Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those table-level locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time.

只要在使用二进制日志重放时的 SQL 语句以相同的顺序执行(使用基于语句的复制, 或在恢复场景中), 结果就与 Tx1 和 Tx2 首次运行时的结果相同。因此, 表级锁一直保持到语句结束, 使得 INSERT 语句的自增量, 可以安全地在基于语句的复制中使用。但是, 这种表级锁会限制多个事务同时执行插入语句时的并发性能和扩展性。

可扩展性(scalability), 这里可理解为通过加CPU等配置来提升性能, 也叫做 Scale Up。

In the preceding example, if there were no table-level lock, the value of the auto-increment column used for the INSERT in Tx2 depends on precisely when the statement executes. If the INSERT of Tx2 executes while the INSERT of Tx1 is running (rather than before it starts or after it completes), the specific auto-increment values assigned by the two INSERT statements are nondeterministic, and may vary from run to run.

在前面的例子中, 如果没有表级锁, 用于 INSERT 的自增列的值, 在 Tx2 中精确取决于语句执行的时间。如果在Tx1的INSERT执行到半中间时, 轮到了 Tx2 的 INSERT 执行(而不是在它开始之前或完成之后), 那么这两个 [INSERT](https://dev .mysql.com/doc/refman/5.7/en/insert.html) 语句分配的自增值就是不确定的, 可能每次运行的结果都不同。

Under the consecutive lock mode, InnoDB can avoid using table-level AUTO-INC locks for simple insert statements where the number of rows is known in advance, and still preserve deterministic execution and safety for statement-based replication.

If you are not using the binary log to replay SQL statements as part of recovery or replication, the interleaved lock mode can be used to eliminate all use of table-level AUTO-INC locks for even greater concurrency and performance, at the cost of permitting gaps in auto-increment numbers assigned by a statement and potentially having the numbers assigned by concurrently executing statements interleaved.

在 consecutive 锁定模式下, InnoDB 可以避免对预先知道行数的简单插入语句使用表级 AUTO-INC 锁, 并且仍然保留基于语句的复制的确定性执行和安全性。

如果您不使用二进制日志来重放 SQL 语句作为恢复或复制的一部分, [交错](https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling. html#innodb-auto-increment-lock-mode-interleaved) 锁定模式可用于消除所有表级AUTO-INC锁的使用, 以获得更高的并发性和性能, 代价是允许自动递增中的间隙由语句分配的编号, 并且可能将同时执行的语句分配的编号交错。

innodb_autoinc_lock_mode = 1 (consecutive lock mode)
innodb_autoinc_lock_mode = 1 连续锁定模式(consecutive lock mode)

This is the default lock mode. In this mode, bulk inserts use the special AUTO-INC table-level lock and hold it until the end of the statement. This applies to all INSERT ... SELECT, REPLACE ... SELECT, and LOAD DATA statements. Only one statement holding the AUTO-INC lock can execute at a time. If the source table of the bulk insert operation is different from the target table, the AUTO-INC lock on the target table is taken after a shared lock is taken on the first row selected from the source table. If the source and target of the bulk insert operation are the same table, the AUTO-INC lock is taken after shared locks are taken on all selected rows.

这是默认的锁定模式。在这种模式下, 批量插入(bulk inserts) 使用特殊的 AUTO-INC 表级锁, 并将其持有到语句结束。这适用于所有的 INSERT ... SELECT, INSERT ... SELECT, REPLACE ... SELECT, 和 LOAD DATA 语句。同一时间, 只有一个语句可以持有 AUTO-INC 锁。如果批量插入操作的源表与目标表不同, 则在对源表中选择的第一行获取共享锁后, 再对目标表获取 AUTO-INC 锁。如果批量插入操作的源表和目标是同一个表, 则在对所有选定行获取共享锁之后, 再获取 AUTO-INC 锁。

Simple inserts (for which the number of rows to be inserted is known in advance) avoid table-level AUTO-INC locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes. No table-level AUTO-INC lock is used unless an AUTO-INC lock is held by another transaction. If another transaction holds an AUTO-INC lock, a simple insert waits for the AUTO-INC lock, as if it were a bulk insert.

简单插入(Simple inserts), 因为预先知道要插入的行数, 会避免使用表级锁, 而是先在互斥锁(mutex, 一种轻量锁)的控制下获取所需数量的自增值, 这个互斥锁只会持有到分配自增值结束, 而不会持续到语句执行完。除非有其他事务持有该表的 AUTO-INC 锁, 否则这种模式下不会使用到表级AUTO-INC锁。如果另一个事务持有了AUTO-INC锁, 则简单插入需要等待 AUTO-INC 锁释放, 就像是退化成 批量插入 一样。

This lock mode ensures that, in the presence of INSERT statements where the number of rows is not known in advance (and where auto-increment numbers are assigned as the statement progresses), all auto-increment values assigned by any INSERT-like statement are consecutive, and operations are safe for statement-based replication.

这种锁定模式, 在执行的 INSERT 语句在影响行数事先不明确时, 随着语句的进行而分配自增量, 可以确保任何一条 INSERT-like 语句所分配的多个自增值是连续的, 并且分配操作对基于语句的复制(statement-based replication)是安全的。

Simply put, this lock mode significantly improves scalability while being safe for use with statement-based replication. Further, as with “traditional” lock mode, auto-increment numbers assigned by any given statement are consecutive. There is no change in semantics compared to “traditional” mode for any statement that uses auto-increment, with one important exception.

简而言之, 这种锁定模式显著提高了可扩展性, 同时还可以安全地使用基于语句的复制。此外, 与traditional锁定模式一样, 任何一个语句分配的自增编号都是连续的(consecutive)。对于任何使用自增量的语句, 与“traditional”模式相比, 语义上没有变化, 但有一种情况例外:

The exception is for mixed-mode inserts, where the user provides explicit values for an AUTO_INCREMENT column for some, but not all, rows in a multiple-row simple insert. For such inserts, InnoDB allocates more auto-increment values than the number of rows to be inserted. However, all values automatically assigned are consecutively generated (and thus higher than) the auto-increment value generated by the most recently executed previous statement. Excess numbers are lost.

例外情况就是混合模式插入(mixed-mode inserts), 在这种情况下, 执行多行的 simple insert 时, 用户在SQL里为其中一部分的AUTO_INCREMENT列提供了明确的值(但又没有全部提供). 对于此类插入, InnoDB 分配的自增值个数比要插入的行数更多。但自动分配的所有自增值都是连续生成的, 因此高于最近执行的语句使用到的自增值。多生成的这些自增值丢失了。

innodb_autoinc_lock_mode = 2 (interleaved lock mode)
innodb_autoinc_lock_mode = 2 交错锁定模式(interleaved lock mode)

In this lock mode, no INSERT-like statements use the table-level AUTO-INC lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log.

在这种锁定模式下, 没有 INSERT-like 语句会使用表级的 AUTO-INC 锁, 同一时间多条语句可以并发执行。这是最快的锁定模式, 也最具有扩展性, 但在基于语句的复制或恢复场景中, 从二进制日志重放 SQL 语句时, 它是不安全的。

In this lock mode, auto-increment values are guaranteed to be unique and monotonically increasing across all concurrently executing INSERT-like statements. However, because multiple statements can be generating numbers at the same time (that is, allocation of numbers is interleaved across statements), the values generated for the rows inserted by any given statement may not be consecutive.

这种锁定模式, 可以保证的是: 在所有并发执行的 INSERT-like 语句中都能得到唯一且单调递增的自增值。但是, 由于多条语句可以同时生成数字(即数字的分配是跨语句交错的), 因此为任何语句生成的自增值可能都不是连续的。

If the only statements executing are simple inserts where the number of rows to be inserted is known ahead of time, there are no gaps in the numbers generated for a single statement, except for mixed-mode inserts. However, when bulk inserts are executed, there may be gaps in the auto-increment values assigned by any given statement.

如果某条SQL语句, 是提前知道要插入行数的简单插入, 则为单个语句生成的数字没有间隙, 除非是混合模式的插入。但是, 当执行批量插入时, 为任何语句分配的自增值中都可能存在间隙。

InnoDB AUTO_INCREMENT Lock Mode Usage Implications

InnoDB中使用AUTO_INCREMENT锁定模式的影响

Using auto-increment with replication

If you are using statement-based replication, set innodb_autoinc_lock_mode to 0 or 1 and use the same value on the source and its replicas. Auto-increment values are not ensured to be the same on the replicas as on the source if you use innodb_autoinc_lock_mode = 2 (interleaved) or configurations where the source and replicas do not use the same lock mode.

If you are using row-based or mixed-format replication, all of the auto-increment lock modes are safe, since row-based replication is not sensitive to the order of execution of the SQL statements (and the mixed format uses row-based replication for any statements that are unsafe for statement-based replication).

主从复制中使用auto-increment

如果使用基于语句的复制(statement-based), 请将 innodb_autoinc_lock_mode 参数设置为 0 或 1, 并且要求源和副本机上使用相同的设置。如果设置了 innodb_autoinc_lock_mode = 2 (interleaved), 或者主库与副本使用不同的锁定模式, 则不能确保副本上的自增值与主库上的相同。

如果使用基于行的复制, 或者混合格式的复制, 任何一种自增锁定模式都是安全的, 因为基于行的复制对 SQL语句的执行顺序不敏感. 使用混合格式的复制时, 如果某条SQL对基于语句的复制不安全, 则会自动使用基于行的复制。

现在一般推荐基于行的复制, 减少各种乱七八糟的问题;

Lost auto-increment values and sequence gaps

In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are lost. Once a value is generated for an auto-increment column, it cannot be rolled back, whether or not the INSERT-like statement is completed, and whether or not the containing transaction is rolled back. Such lost values are not reused. Thus, there may be gaps in the values stored in an AUTO_INCREMENT column of a table.

丢失的自增值, 以及序列间隙

在所有锁定模式(0, 1, 2)中, 如果生成自增值的事务回滚了, 则这些自增值将会丢失。无论 INSERT-like 语句是否完成, 相关事务是否回滚, 只要生成了自增值, 这些自增值都不会减回去。丢失的值不会被重用。所以, 存储在 AUTO_INCREMENT 列中的值有可能存在间隙。

要想使用自增又没有间隙, 有什么办法实现吗? 可以考虑简单语句不回滚。或者在业务上分阶段处理, 会撤销的单子, 在不能撤销之后使用另一个表来记录单号。这也是流水号和订单号的区别。

Specifying NULL or 0 for the AUTO_INCREMENT column

In all lock modes (0, 1, and 2), if a user specifies NULL or 0 for the AUTO_INCREMENT column in an INSERT, InnoDB treats the row as if the value was not specified and generates a new value for it.

为 AUTO_INCREMENT 列指定 NULL 或 0 值

不管是哪种锁定模式 (0, 1, 2), 如果用户在 INSERT 语句中为 AUTO_INCREMENT 列指定 NULL 或 0 值, InnoDB 将其视为未指定值, 并自动为其生成一个新值。

Assigning a negative value to the AUTO_INCREMENT column

In all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if you assign a negative value to the AUTO_INCREMENT column.

为 AUTO_INCREMENT 列指定负值

不管是哪种锁定模式 (0, 1, 2), 如果用户在 INSERT 语句中为 AUTO_INCREMENT 列指定负值, 则自增机制是未定义的。

If the AUTO_INCREMENT value becomes larger than the maximum integer for the specified integer type

In all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is not defined if the value becomes larger than the maximum integer that can be stored in the specified integer type.

如果 AUTO_INCREMENT 值大于列类型所允许的最大整数

不管是哪种锁定模式 (0, 1, 2), 如果自增值大于该列所允许的最大整数, 则自增机制是未定义的。

Gaps in auto-increment values for bulk inserts

With innodb_autoinc_lock_mode set to 0 (traditional) or 1 (consecutive), the auto-increment values generated by any given statement are consecutive, without gaps, because the table-level AUTO-INC lock is held until the end of the statement, and only one such statement can execute at a time.

With innodb_autoinc_lock_mode set to 2 (interleaved), there may be gaps in the auto-increment values generated by bulk inserts, but only if there are concurrently executing INSERT-like statements.

For lock modes 1 or 2, gaps may occur between successive statements because for bulk inserts the exact number of auto-increment values required by each statement may not be known and overestimation is possible.

批量插入(bulk inserts)中自增值的间隙

将 innodb_autoinc_lock_mode 设置为 0 (traditional) )或 1 (consecutive), 任何一条语句生成的自增值都是连续的, 不会存在间隙, 因为表级 AUTO-INC 锁一直持有到语句结束, 并且同一时间只能执行一条这样的语句。

当 innodb_autoinc_lock_mode 设置为 2 (interleaved) 时, 批量插入生成的自增值可能会存在间隙, 但也只有并发执行多个 INSERT-like 语句时有这种问题。

对于 1 和 2 这两种锁定模式, 多个语句之间的自增列值可能会出现间隙, 因为对于批量插入, 可能不知道每个语句所需的自增值的确切数量, 并且可能会高估。

Auto-increment values assigned by mixed-mode inserts

Consider a mixed-mode insert, where a simple insert specifies the auto-increment value for some (but not all) resulting rows. Such a statement behaves differently in lock modes 0, 1, and 2. For example, assume id is an AUTO_INCREMENT column of table t1, and that the most recent automatically generated sequence number is 100.

混合模式插入分配的自增值

混合模式插入, 是在简单插入的基础上, 其中有一部分行指定了自增列的值, 另一部分没指定。这种语句在锁定模式 0, 1, 2 中的行为都不同。例如, 假设表 t1 中有一个 AUTO_INCREMENT 列 id, 并且最近自动生成的序列号是 100;

mysql> CREATE TABLE t1 (
  -> id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  -> c2 CHAR(1)
  -> ) ENGINE = INNODB;

Now, consider the following mixed-mode insert statement:

现在有一条 混合模式插入 语句:

mysql> INSERT INTO t1 (id,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');

With innodb_autoinc_lock_mode set to 0 (traditional), the four new rows are:

如果设置为传统锁定模式: innodb_autoinc_lock_mode = 0, 那么新插入的四行:

mysql> SELECT id, c2 FROM t1 ORDER BY c2;
+-----+------+
| id  | c2   |
+-----+------+
|   1 | a    |
| 101 | b    |
|   5 | c    |
| 102 | d    |
+-----+------+

The next available auto-increment value is 103 because the auto-increment values are allocated one at a time, not all at once at the beginning of statement execution. This result is true whether or not there are concurrently executing INSERT-like statements (of any type).

With innodb_autoinc_lock_mode set to 1 (consecutive), the four new rows are also:

下一个可用的自增值为 103, 因为在传统模式下, 自增值每次分配一个, 而不是在语句开始执行时一次性分配。无论是否有同时并发执行的 INSERT-like 语句(任何类型), 此结果都是一样的。

如果将 innodb_autoinc_lock_mode 设置为连续模式(1, consecutive), 新插入的四行也是一样的:

mysql> SELECT id, c2 FROM t1 ORDER BY c2;
+-----+------+
| id  | c2   |
+-----+------+
|   1 | a    |
| 101 | b    |
|   5 | c    |
| 102 | d    |
+-----+------+

However, in this case, the next available auto-increment value is 105, not 103 because four auto-increment values are allocated at the time the statement is processed, but only two are used. This result is true whether or not there are concurrently executing INSERT-like statements (of any type).

With innodb_autoinc_lock_mode set to mode 2 (interleaved), the four new rows are:

但是, 在这种模式下, 下一个可用的自增值是 105, 而不是 103, 因为在处理语句时分配了四个自增值, 但只使用了两个。无论是否同时并发执行 INSERT-like语句(任何类型), 此结果都是一致的。

如果将 innodb_autoinc_lock_mode 设置为交错模式(2, interleaved), 四个新行是:

mysql> SELECT id, c2 FROM t1 ORDER BY c2;
+-----+------+
| id  | c2   |
+-----+------+
|   1 | a    |
|   x | b    |
|   5 | c    |
|   y | d    |
+-----+------+

The values of x and y are unique and larger than any previously generated rows. However, the specific values of x and y depend on the number of auto-increment values generated by concurrently executing statements.

Finally, consider the following statement, issued when the most-recently generated sequence number is 100:

这里 x 和 y 是代指, 他们的值都是唯一的, 并且比之前生成的任何行都大。但是, x 和y 的具体值取决于并发执行语句时生成自增值的数量。

最后, 请看以下SQL语句, 在最近生成的序列号为 100 时执行:

mysql> INSERT INTO t1 (id,c2) VALUES (1,'a'), (NULL,'b'), (101,'c'), (NULL,'d');

With any innodb_autoinc_lock_mode setting, this statement generates a duplicate-key error 23000 (Can't write; duplicate key in table) because 101 is allocated for the row (NULL, 'b') and insertion of the row (101, 'c') fails.

在这种前提下, 使用任何一种锁定模式 innodb_autoinc_lock_mode, 执行这条SQL语句时都会产生 duplicate-key error 23000 错误(Can't write; duplicate key in table), 因为 101 分配给了 (NULL, 'b'), 所以插入行 (101, 'c') 失败。

Modifying AUTO_INCREMENT column values in the middle of a sequence of INSERT statements

In all lock modes (0, 1, and 2), modifying an AUTO_INCREMENT column value in the middle of a sequence of INSERT statements could lead to Duplicate entry errors. For example, if you perform an UPDATE operation that changes an AUTO_INCREMENT column value to a value larger than the current maximum auto-increment value, subsequent INSERT operations that do not specify an unused auto-increment value could encounter Duplicate entry errors. This behavior is demonstrated in the following example.

在 INSERT 语句序列中间修改 AUTO_INCREMENT 列值

所有锁定模式下 (0, 1, 2), 在 INSERT 语句序列中间修改 AUTO_INCREMENT 列值, 都可能会导致 Duplicate entry(唯一键重复) 错误。例如, 如果执行 UPDATE 操作将 AUTO_INCREMENT 列的值改为大于当前最大的自增值, 后续未指定自增值的 INSERT 操作可能会遇到 Duplicate entry 错误。下面通过示例来演示这种行为:

mysql> CREATE TABLE t1 (
  -> id INT NOT NULL AUTO_INCREMENT,
  -> PRIMARY KEY (id)
  ->  ) ENGINE = InnoDB;

mysql> INSERT INTO t1 VALUES(0), (0), (3);

mysql> SELECT id FROM t1;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+

mysql> UPDATE t1 SET id = 4 WHERE id = 1;

mysql> SELECT id FROM t1;
+----+
| id |
+----+
|  2 |
|  3 |
|  4 |
+----+

mysql> INSERT INTO t1 VALUES(0);
ERROR 1062 (23000): Duplicate entry '4' for key 'PRIMARY'

InnoDB AUTO_INCREMENT Counter Initialization

InnoDB 中 AUTO_INCREMENT 计数器的初始化

This section describes how InnoDB initializes AUTO_INCREMENT counters.

If you specify an AUTO_INCREMENT column for an InnoDB table, the table handle in the InnoDB data dictionary contains a special counter called the auto-increment counter that is used in assigning new values for the column. This counter is stored only in main memory, not on disk.

To initialize an auto-increment counter after a server restart, InnoDB executes the equivalent of the following statement on the first insert into a table containing an AUTO_INCREMENT column.

本节描述了 InnoDB 如何初始化 AUTO_INCREMENT 计数器。

如果为InnoDB表指定AUTO_INCREMENT列, 则InnoDB数据字典中的表句柄(table handle)包含一个特殊的计数器, 称为自动增量计数器, 用于为该列分配新值。这个计数器只在内存中, 不存储到磁盘。

在服务器重启后, 为了初始化自增计数器, InnoDB 在第一次插入包含 AUTO_INCREMENT 列的表时, 执行等价于下面这条语句的操作。

SELECT MAX(auto_incr_col) FROM table_name FOR UPDATE;

InnoDB increments the value retrieved by the statement and assigns it to the column and to the auto-increment counter for the table. By default, the value is incremented by 1. This default can be overridden by the auto_increment_increment configuration setting.

InnoDB 在获取到最大值的基础上递增, 并将其赋值给自增列, 同时赋值给该表的自动递增计数器。默认情况下, 递增值为 1。这个默认值可以被 auto_increment_increment 配置覆盖。

If the table is empty, InnoDB uses the value 1. This default can be overridden by the auto_increment_offset configuration setting.

如果是空表, InnoDB 使用值 1。这个默认值可以被 auto_increment_offset 配置覆盖。

If a SHOW TABLE STATUS statement examines the table before the auto-increment counter is initialized, InnoDB initializes but does not increment the value. The value is stored for use by later inserts. This initialization uses a normal exclusive-locking read on the table and the lock lasts to the end of the transaction. InnoDB follows the same procedure for initializing the auto-increment counter for a newly created table.

如果在自动递增计数器初始化之前, 通过 SHOW TABLE STATUS 语句检查表, InnoDB 会初始化计数器但不递增值。并将这个值存起来, 以供后续使用。初始化操作会对该表使用一个普通的排他锁来读取, 并且一直锁定到事务结束。 InnoDB 遵循相同的过程来为新创建的表初始化自增计数器。

After the auto-increment counter has been initialized, if you do not explicitly specify a value for an AUTO_INCREMENT column, InnoDB increments the counter and assigns the new value to the column. If you insert a row that explicitly specifies the column value, and the value is greater than the current counter value, the counter is set to the specified column value.

在自增计数器初始化完成后, 如果SQL语句没有为 AUTO_INCREMENT 列明确指定值, InnoDB 会递增计数器并将新值分配给该列。如果 insert 语句显式指定了自增列的值, 并且该值大于当前计数器的值, 则将计数器设置为指定的新值。

InnoDB uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, InnoDB reinitializes the counter for each table for the first INSERT to the table, as described earlier.

只要服务器一直在运行, InnoDB 都会使用内存中的自增计数器。当服务器停止并重新启动时, InnoDB 在第一次 INSERT 某个表时, 为该表重新初始化计数器, 这个前面说过了。

A server restart also cancels the effect of the AUTO_INCREMENT = N table option in CREATE TABLE and ALTER TABLE statements, which you can use with InnoDB tables to set the initial counter value or alter the current counter value.

CREATE TABLE 和 ALTER TABLE 语句可以附带 AUTO_INCREMENT = N 选项, 用来指定自增计数器的初始值/当前值。如果服务器重启, 则会取消该选项的效果。

Notes

When an AUTO_INCREMENT integer column runs out of values, a subsequent INSERT operation returns a duplicate-key error. This is general MySQL behavior.
When you restart the MySQL server, InnoDB may reuse an old value that was generated for an AUTO_INCREMENT column but never stored (that is, a value that was generated during an old transaction that was rolled back).

注意事项

当 AUTO_INCREMENT 整数列的可用值用完, 后续的 INSERT 操作会返回唯一键重复错误(duplicate-key)。这是 MySQL 的一般行为。
重启 MySQL 服务器时, InnoDB 可能会重用为 AUTO_INCREMENT 列生成但从保存到数据库的值(比如, 在回滚的事务中生成的自增值)。

14.6.2 Indexes

14.6.2 `InnoDB` 索引在磁盘上的结构

14.6.2.1 Clustered and Secondary Indexes
14.6.2.2 The Physical Structure of an InnoDB Index
14.6.2.3 Sorted Index Builds
14.6.2.4 InnoDB FULLTEXT Indexes

This section covers topics related to InnoDB indexes.

本节介绍 InnoDB 索引相关的信息。

14.6.2.1 Clustered and Secondary Indexes

14.6.2.1 聚集索引(Clustered Indexes)与二级索引(Secondary Indexes)

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

InnoDB中每张表都有一个特殊的索引, 称为聚集索引(clustered index); 其中不止有索引结构, 同时还存储数据。一般情况下, 聚集索引和 primary key 是同一个意思。为了在查询、新增等数据库操作时获得最佳性能, 我们必须理解 InnoDB 是如何通过聚集索引, 来优化每张表的大部分查询和DML操作的。

When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.
If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
如果一张表中定义了 PRIMARY KEY, InnoDB会将其作为聚集索引。所以我们需要为每张表都创建主键索引。假如不存在逻辑上唯一且非空的列/列组合, 可以增加一个 auto-increment 列, 自增列的值会自动填充。
如果没有定义 PRIMARY KEY, MySQL会查找第一个非空(NOT NULL) 的唯一索引列( UNIQUE ), InnoDB 会将其作为聚集索引。
如果某张表没有 PRIMARY KEY, 也找不到合适的唯一索引, 则InnoDB会在一个包含row ID的虚拟列上生成内置的聚集索引, 称为 GEN_CLUST_INDEX, 排序字段则是 InnoDB 为这张表生成的行ID; row ID的大小为6个字节(48bit), 有新行插入时会单调递增, 所以行ID从物理结构上就和插入的顺序一致。

How the Clustered Index Speeds Up Queries

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

聚集索引查询速度快的原理

通过聚集索引访问单行数据非常快, 因为索引搜索直接指向包含所有行数据的页面。如果表中的数据量很大, 聚集索引结构可以减少磁盘I/O操作, 相比而言, 将索引和数据放到不同的磁盘页面则会需要多次磁盘IO。

How Secondary Indexes Relate to the Clustered Index

二级索引与聚集索引的关系

All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.

除了聚集索引, 其他的索引都称为二级索引, 也可以称为普通索引。在 InnoDB 中, 二级索引上的每条记录, 都包含创建索引时指定的列, 以及该行的主键列(值)。查询时, InnoDB 再通过这个主键值去定位聚集索引中的行。

If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

如果主键很长, 二级索引会占用更多的空间, 这时候选择占用空间较小的列作为主键是有利的。

For guidelines to take advantage of InnoDB clustered and secondary indexes, see Section 8.3, Optimization and Indexes.

有关 InnoDB 聚集索引和二级索引的优化指南, 请参阅 Section 8.3, Optimization and Indexes.

14.6.2.2 The Physical Structure of an InnoDB Index

14.6.2.2 InnoDB索引的物理结构

With the exception of spatial indexes, InnoDB indexes are B-tree data structures. Spatial indexes use R-trees, which are specialized data structures for indexing multi-dimensional data. Index records are stored in the leaf pages of their B-tree or R-tree data structure. The default size of an index page is 16KB.

除了空间索引, 其他的 InnoDB 普通索引都使用 B-tree 数据结构。空间索引使用的是 R-trees 结构, 这是多维索引的专用数据结构。索引记录存储在 B-tree 或者 R-tree 数据结构的叶子中。索引页的默认大小为 16KB。

When new records are inserted into an InnoDB clustered index, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full.

当新记录插入到 InnoDB 的一个 clustered index 时, InnoDB 会试图让 1/16 的页面保持空闲, 以便将来可用于插入和更新索引记录。如果按顺序插入索引记录(升序或降序), 则生成的索引页使用率大约为 15/16。如果以随机顺序插入记录, 则页面使用率在 1/2 到 15/16 之间。

InnoDB performs a bulk load when creating or rebuilding B-tree indexes. This method of index creation is known as a sorted index build. The innodb_fill_factor configuration option defines the percentage of space on each B-tree page that is filled during a sorted index build, with the remaining space reserved for future index growth. Sorted index builds are not supported for spatial indexes. For more information, see Section 14.6.2.3, Sorted Index Builds. An innodb_fill_factor setting of 100 leaves 1/16 of the space in clustered index pages free for future index growth.

InnoDB 在创建或重建 B-tree 索引时执行批量加载。这种创建索引的方法称为排序索引构建(sorted index build)。在排序索引构建时, 配置选项 innodb_fill_factor 指定了每个 B-tree 页面上填充的空间百分比, 剩余的空间则保留着, 用于未来的索引增长。空间索引不支持排序索引构建。更多信息请参考 14.6.2.3 排序索引构建。将 innodb_fill_factor 设置为 100 会留下 1/16 的聚集索引页空间, 给未来的索引增长使用。

If the fill factor of an InnoDB index page drops below the MERGE_THRESHOLD, which is 50% by default if not specified, InnoDB tries to contract the index tree to free the page. The MERGE_THRESHOLD setting applies to both B-tree and R-tree indexes. For more information, see Section 14.8.12, Configuring the Merge Threshold for Index Pages.

如果 InnoDB 索引页的填充因子小于 MERGE_THRESHOLD(默认是50%), InnoDB 会尝试收缩索引树以释放页面。 MERGE_THRESHOLD 设置适用于 B-tree 和 R-tree 索引。更多信息请参考 Section 14.8.12, Configuring the Merge Threshold for Index Pages.

You can define the page size for all InnoDB tablespaces in a MySQL instance by setting the innodb_page_size configuration option prior to initializing the MySQL instance. Once the page size for an instance is defined, you cannot change it without reinitializing the instance. Supported sizes are 64KB, 32KB, 16KB (default), 8KB, and 4KB.

可以通过设置MySQL实例的初始化参数 innodb_page_size, 来为所有 InnoDB 表空间指定 page size。一旦MySQL实例的页面大小确定, 如果不重新初始化实例, 则不能进行更改它。支持的页面大小为: 64KB, 32KB, 16KB(默认值), 8KB, and 4KB。

Support for 32KB and 64KB pages sizes was added in MySQL 5.7. For more information, refer to the innodb_page_size documentation.

32KB 和 64KB 页面大小从 MySQL 5.7 开始支持。更多信息, 请参阅 innodb_page_size 文档。

A MySQL instance using a particular InnoDB page size cannot use data files or log files from an instance that uses a different page size.

InnoDB 使用某种页面大小配置的 MySQL 实例, 不能使用来自于页面大小不同的实例生成的数据文件或日志文件。

14.6.2.3 Sorted Index Builds

14.6.2.3 排序索引构建

InnoDB performs a bulk load instead of inserting one index record at a time when creating or rebuilding indexes. This method of index creation is also known as a sorted index build. Sorted index builds are not supported for spatial indexes.

在创建或重建索引时, InnoDB 执行批量加载, 而不是每次插入一条索引记录。这种索引创建方法也称为排序索引构建(sorted index build)。空间索引不支持排序索引构建。

There are three phases to an index build. In the first phase, the clustered index is scanned, and index entries are generated and added to the sort buffer. When the sort buffer becomes full, entries are sorted and written out to a temporary intermediate file. This process is also known as a run. In the second phase, with one or more runs written to the temporary intermediate file, a merge sort is performed on all entries in the file. In the third and final phase, the sorted entries are inserted into the B-tree.

索引构建分为三个阶段。在第一阶段, 扫描聚集索引, 生成索引条目并添加到排序缓冲区(sort buffer)。如果 sort buffer 变满, 则索引条目被排序并写出到临时中间文件。此过程也称为 run(运行)。在第二阶段, 将一个或多个 run 写入到临时中间文件时, 会对文件中的所有条目执行归并排序(merge sort)。在第三阶段, 将排序后的索引条目插入到 B-tree 中。

Prior to the introduction of sorted index builds, index entries were inserted into the B-tree one record at a time using insert APIs. This method involved opening a B-tree cursor to find the insert position and then inserting entries into a B-tree page using an optimistic insert. If an insert failed due to a page being full, a pessimistic insert would be performed, which involves opening a B-tree cursor and splitting and merging B-tree nodes as necessary to find space for the entry. The drawbacks of this top-down method of building an index are the cost of searching for an insert position and the constant splitting and merging of B-tree nodes.

在引入排序索引构建技术前的版本中, 需要使用 insert API, 将索引条目每次一条地插入到 B 树中。这种方法会打开一个 B-tree cursor(游标) 来找到插入位置, 然后使用 optimistic(乐观锁) 将索引条目插入到 B-tree 页。如果因为页面已满导致插入失败, 则会执行 pessimistic(悲观锁) 插入, 这涉及打开一个 B-tree cursor 并根据需要拆分和合并 B树节点以找到条目的空间。这种 top-down(自顶向下) 方法构建索引的缺点, 是查找插入位置的开销, 以及 B 树节点的不断分裂和合并。

Sorted index builds use a bottom-up approach to building an index. With this approach, a reference to the right-most leaf page is held at all levels of the B-tree. The right-most leaf page at the necessary B-tree depth is allocated and entries are inserted according to their sorted order. Once a leaf page is full, a node pointer is appended to the parent page and a sibling leaf page is allocated for the next insert. This process continues until all entries are inserted, which may result in inserts up to the root level. When a sibling page is allocated, the reference to the previously pinned leaf page is released, and the newly allocated leaf page becomes the right-most leaf page and new default insert location.

排序索引构建使用 bottom-up(自下而上) 的方法来构建索引。使用这种方法, 对最右侧叶子page的引用保存在 B 树的所有级别。必要时在 B 树深度的最右侧分配叶子page, 并根据其排序顺序插入条目。一旦叶子页面已满, 节点指针将附加到父页, 并为下一次插入分配同级叶子page。这个过程一直持续到所有条目都被插入, 这可能会导致插入到root级别。分配同级页时, 释放对先前固定的叶子页的引用, 新分配的页面成为最右侧的叶子页,作为新的默认插入位置。

Reserving B-tree Page Space for Future Index Growth

To set aside space for future index growth, you can use the innodb_fill_factor configuration option to reserve a percentage of B-tree page space. For example, setting innodb_fill_factor to 80 reserves 20 percent of the space in B-tree pages during a sorted index build. This setting applies to both B-tree leaf and non-leaf pages. It does not apply to external pages used for TEXT or BLOB entries. The amount of space that is reserved may not be exactly as configured, as the innodb_fill_factor value is interpreted as a hint rather than a hard limit.

为未来的索引增长保留 B 树页面空间

要为未来的索引增长留出空间, 可以使用 innodb_fill_factor(填充因子) 配置选项来保留一定百分比的B 树页面空间。例如, 将 innodb_fill_factor 设置为 80 则会在排序索引构建时,在 B-tree 页面中保留 20% 的空间。此设置适用于 B 树的叶子页和非叶子页。但不适用于 TEXT 或 BLOB 等外部页面。保留的空间量可能与配置的不完全相同, 因为 innodb_fill_factor 是一种建议值而不是硬限制。

Sorted Index Builds and Full-Text Index Support

Sorted index builds are supported for fulltext indexes. Previously, SQL was used to insert entries into a fulltext index.

排序索引构建和全文索引支持

fulltext indexes 支持排序索引构建。以前的版本中, 是通过SQL方式来将条目插入到全文索引中。

Sorted Index Builds and Compressed Tables

For compressed tables, the previous index creation method appended entries to both compressed and uncompressed pages. When the modification log (representing free space on the compressed page) became full, the compressed page would be recompressed. If compression failed due to a lack of space, the page would be split. With sorted index builds, entries are only appended to uncompressed pages. When an uncompressed page becomes full, it is compressed. Adaptive padding is used to ensure that compression succeeds in most cases, but if compression fails, the page is split and compression is attempted again. This process continues until compression is successful. For more information about compression of B-Tree pages, see Section 14.9.1.5, How Compression Works for InnoDB Tables.

排序索引构建和压缩表

对于 compressed tables(压缩表), 之前版本的索引创建方法将索引条目附加到压缩和非压缩页面。当修改日志(代表压缩页面上的可用空间)变满时, 将重新压缩compressed页面。如果由于空间不足导致压缩失败, 页面将被拆分。使用排序索引构建时, 条目仅附加到非压缩的页面。当非压缩的页面变满时, 才会被压缩。自适应空白用于确保在大多数情况下都能压缩成功, 但如果压缩失败, 页面将被拆分并再次尝试压缩。这个过程一直持续到压缩成功。有关 B-Tree 页面压缩的更多信息, 请参阅 Section 14.9.1.5, How Compression Works for InnoDB Tables.

Sorted Index Builds and Redo Logging

Redo logging is disabled during a sorted index build. Instead, there is a checkpoint to ensure that the index build can withstand an unexpected exit or failure. The checkpoint forces a write of all dirty pages to disk. During a sorted index build, the page cleaner thread is signaled periodically to flush dirty pages to ensure that the checkpoint operation can be processed quickly. Normally, the page cleaner thread flushes dirty pages when the number of clean pages falls below a set threshold. For sorted index builds, dirty pages are flushed promptly to reduce checkpoint overhead and to parallelize I/O and CPU activity.

排序索引构建和重做日志记录

Redo logging(重做日志) 在排序索引构建期间被禁用。同时使用一个 checkpoint(检查点) 来确保索引构建可以承受意外退出或失败。检查点强制将所有脏页写入磁盘。在排序索引构建期间, page cleaner(页面清理器) 线程会定期收到信号并刷新脏页以确保可以快速处理检查点操作。通常, 当干净页面(clean page)的数量低于设置的阈值时, 页面清理器线程会刷新脏页。对于排序索引构建, 脏页会立即刷新以减少检查点开销, 以及并行化 I/O 和 CPU 活动。

Sorted Index Builds and Optimizer Statistics

Sorted index builds may result in optimizer statistics that differ from those generated by the previous method of index creation. The difference in statistics, which is not expected to affect workload performance, is due to the different algorithm used to populate the index.

排序索引构建和优化器统计信息

排序的索引构建可能会导致 optimizer 统计信息与以前的索引创建方法生成的统计信息不同。是由于用于填充索引的算法不同, 统计数据差异,预计不会影响工作负载性能。

14.6.2.4 InnoDB FULLTEXT Indexes

14.6.2.4 InnoDB的全文索引索引(FULLTEXT Indexes)

全文索引部分暂时不翻译; 推荐使用ES;

FULLTEXT indexes are created on text-based columns (CHAR, VARCHAR, or TEXT columns) to help speed up queries and DML operations on data contained within those columns, omitting any words that are defined as stopwords.

A FULLTEXT index is defined as part of a CREATE TABLE statement or added to an existing table using ALTER TABLE or CREATE INDEX.

Full-text search is performed using MATCH() ... AGAINST syntax. For usage information, see Section 12.10, Full-Text Search Functions.

InnoDB FULLTEXT indexes are described under the following topics in this section:

InnoDB Full-Text Index Design
InnoDB Full-Text Index Tables
InnoDB Full-Text Index Cache
InnoDB Full-Text Index Document ID and FTS_DOC_ID Column
InnoDB Full-Text Index Deletion Handling
InnoDB Full-Text Index Transaction Handling
Monitoring InnoDB Full-Text Indexes

InnoDB Full-Text Index Design

InnoDB FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.

InnoDB Full-Text Index Tables

When creating an InnoDB FULLTEXT index, a set of index tables is created, as shown in the following example:

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200),
       FULLTEXT idx (opening_line)
       ) ENGINE=InnoDB;

mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
       WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name                                               | space |
+----------+----------------------------------------------------+-------+
|      333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 |   289 |
|      334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 |   290 |
|      335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 |   291 |
|      336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 |   292 |
|      337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 |   293 |
|      338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 |   294 |
|      330 | test/FTS_0000000000000147_BEING_DELETED            |   286 |
|      331 | test/FTS_0000000000000147_BEING_DELETED_CACHE      |   287 |
|      332 | test/FTS_0000000000000147_CONFIG                   |   288 |
|      328 | test/FTS_0000000000000147_DELETED                  |   284 |
|      329 | test/FTS_0000000000000147_DELETED_CACHE            |   285 |
|      327 | test/opening_lines                                 |   283 |
+----------+----------------------------------------------------+-------+

The first six tables represent the inverted index and are referred to as auxiliary index tables. When incoming documents are tokenized, the individual words (also referred to as tokens) are inserted into the index tables along with position information and the associated Document ID (DOC_ID). The words are fully sorted and partitioned among the six index tables based on the character set sort weight of the word's first character.

The inverted index is partitioned into six auxiliary index tables to support parallel index creation. By default, two threads tokenize, sort, and insert words and associated data into the index tables. The number of threads is configurable using the innodb_ft_sort_pll_degree option. Consider increasing the number of threads when creating FULLTEXT indexes on large tables.

Auxiliary index table names are prefixed with FTS_ and postfixed with INDEX_*. Each index table is associated with the indexed table by a hex value in the index table name that matches the table_id of the indexed table. For example, the table_id of the test/opening_lines table is 327, for which the hex value is 0x147. As shown in the preceding example, the 147 hex value appears in the names of index tables that are associated with the test/opening_lines table.

A hex value representing the index_id of the FULLTEXT index also appears in auxiliary index table names. For example, in the auxiliary table name test/FTS_0000000000000147_00000000000001c9_INDEX_1, the hex value 1c9 has a decimal value of 457. The index defined on the opening_lines table (idx) can be identified by querying the INFORMATION_SCHEMA.INNODB_SYS_INDEXES table for this value (457).

mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_SYS_INDEXES
       WHERE index_id=457;
+----------+------+----------+-------+
| index_id | name | table_id | space |
+----------+------+----------+-------+
|      457 | idx  |      327 |   283 |
+----------+------+----------+-------+

Index tables are stored in their own tablespace if the primary table is created in a file-per-table tablespace.

The other index tables shown in the preceding example are referred to as common index tables and are used for deletion handling and storing the internal state of FULLTEXT indexes. Unlike the inverted index tables, which are created for each full-text index, this set of tables is common to all full-text indexes created on a particular table.

Common auxiliary tables are retained even if full-text indexes are dropped. When a full-text index is dropped, the FTS_DOC_ID column that was created for the index is retained, as removing the FTS_DOC_ID column would require rebuilding the table. Common axillary tables are required to manage the FTS_DOC_ID column.

FTS_*_DELETED and FTS_*_DELETED_CACHE

Contain the document IDs (DOC_ID) for documents that are deleted but whose data is not yet removed from the full-text index. The FTS_*_DELETED_CACHE is the in-memory version of the FTS_*_DELETED table.
FTS_*_BEING_DELETED and FTS_*_BEING_DELETED_CACHE

Contain the document IDs (DOC_ID) for documents that are deleted and whose data is currently in the process of being removed from the full-text index. The FTS_*_BEING_DELETED_CACHE table is the in-memory version of the FTS_*_BEING_DELETED table.
FTS_*_CONFIG

Stores information about the internal state of the FULLTEXT index. Most importantly, it stores the FTS_SYNCED_DOC_ID, which identifies documents that have been parsed and flushed to disk. In case of crash recovery, FTS_SYNCED_DOC_ID values are used to identify documents that have not been flushed to disk so that the documents can be re-parsed and added back to the FULLTEXT index cache. To view the data in this table, query the INFORMATION_SCHEMA.INNODB_FT_CONFIG table.

InnoDB Full-Text Index Cache

When a document is inserted, it is tokenized, and the individual words and associated data are inserted into the FULLTEXT index. This process, even for small documents, could result in numerous small insertions into the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, InnoDB uses a FULLTEXT index cache to temporarily cache index table insertions for recently inserted rows. This in-memory cache structure holds insertions until the cache is full and then batch flushes them to disk (to the auxiliary index tables). You can query the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table to view tokenized data for recently inserted rows.

The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.

The innodb_ft_cache_size variable is used to configure the full-text index cache size (on a per-table basis), which affects how often the full-text index cache is flushed. You can also define a global full-text index cache size limit for all tables in a given instance using the innodb_ft_total_cache_size option.

The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the full-text auxiliary tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.

InnoDB Full-Text Index Document ID and FTS_DOC_ID Column

InnoDB uses a unique document identifier referred to as a Document ID (DOC_ID) to map words in the full-text index to document records where the word appears. The mapping requires an FTS_DOC_ID column on the indexed table. If an FTS_DOC_ID column is not defined, InnoDB automatically adds a hidden FTS_DOC_ID column when the full-text index is created. The following example demonstrates this behavior.

The following table definition does not include an FTS_DOC_ID column:

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200)
       ) ENGINE=InnoDB;

When you create a full-text index on the table using CREATE FULLTEXT INDEX syntax, a warning is returned which reports that InnoDB is rebuilding the table to add the FTS_DOC_ID column.

mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 1

mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------------+
| Level   | Code | Message                                          |
+---------+------+--------------------------------------------------+
| Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |
+---------+------+--------------------------------------------------+

The same warning is returned when using ALTER TABLE to add a full-text index to a table that does not have an FTS_DOC_ID column. If you create a full-text index at CREATE TABLE time and do not specify an FTS_DOC_ID column, InnoDB adds a hidden FTS_DOC_ID column, without warning.

Defining an FTS_DOC_ID column at CREATE TABLE time is less expensive than creating a full-text index on a table that is already loaded with data. If an FTS_DOC_ID column is defined on a table prior to loading data, the table and its indexes do not have to be rebuilt to add the new column. If you are not concerned with CREATE FULLTEXT INDEX performance, leave out the FTS_DOC_ID column to have InnoDB create it for you. InnoDB creates a hidden FTS_DOC_ID column along with a unique index (FTS_DOC_ID_INDEX) on the FTS_DOC_ID column. If you want to create your own FTS_DOC_ID column, the column must be defined as BIGINT UNSIGNED NOT NULL and named FTS_DOC_ID (all uppercase), as in the following example:

Note

The FTS_DOC_ID column does not need to be defined as an AUTO_INCREMENT column, but AUTO_INCREMENT could make loading data easier.

mysql> CREATE TABLE opening_lines (
       FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200)
       ) ENGINE=InnoDB;

If you choose to define the FTS_DOC_ID column yourself, you are responsible for managing the column to avoid empty or duplicate values. FTS_DOC_ID values cannot be reused, which means FTS_DOC_ID values must be ever increasing.

Optionally, you can create the required unique FTS_DOC_ID_INDEX (all uppercase) on the FTS_DOC_ID column.

mysql> CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);

If you do not create the FTS_DOC_ID_INDEX, InnoDB creates it automatically.

Before MySQL 5.7.13, the permitted gap between the largest used FTS_DOC_ID value and new FTS_DOC_ID value is 10000. In MySQL 5.7.13 and later, the permitted gap is 65535.

To avoid rebuilding the table, the FTS_DOC_ID column is retained when dropping a full-text index.

InnoDB Full-Text Index Deletion Handling

Deleting a record that has a full-text index column could result in numerous small deletions in the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, the Document ID (DOC_ID) of a deleted document is logged in a special FTS_*_DELETED table whenever a record is deleted from an indexed table, and the indexed record remains in the full-text index. Before returning query results, information in the FTS_*_DELETED table is used to filter out deleted Document IDs. The benefit of this design is that deletions are fast and inexpensive. The drawback is that the size of the index is not immediately reduced after deleting records. To remove full-text index entries for deleted records, run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to rebuild the full-text index. For more information, see Optimizing InnoDB Full-Text Indexes.

InnoDB Full-Text Index Transaction Handling

InnoDB FULLTEXT indexes have special transaction handling characteristics due its caching and batch processing behavior. Specifically, updates and insertions on a FULLTEXT index are processed at transaction commit time, which means that a FULLTEXT search can only see committed data. The following example demonstrates this behavior. The FULLTEXT search only returns a result after the inserted lines are committed.

mysql> CREATE TABLE opening_lines (
       id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
       opening_line TEXT(500),
       author VARCHAR(200),
       title VARCHAR(200),
       FULLTEXT idx (opening_line)
       ) ENGINE=InnoDB;

mysql> BEGIN;

mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
       ('Call me Ishmael.','Herman Melville','Moby-Dick'),
       ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
       ('I am an invisible man.','Ralph Ellison','Invisible Man'),
       ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
       ('It was love at first sight.','Joseph Heller','Catch-22'),
       ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
       ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
       ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');

mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+

mysql> COMMIT;

mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
|        1 |
+----------+

Monitoring InnoDB Full-Text Indexes

You can monitor and examine the special text-processing aspects of InnoDB FULLTEXT indexes by querying the following INFORMATION_SCHEMA tables:

INNODB_FT_CONFIG
INNODB_FT_INDEX_TABLE
INNODB_FT_INDEX_CACHE
INNODB_FT_DEFAULT_STOPWORD
INNODB_FT_DELETED
INNODB_FT_BEING_DELETED

You can also view basic information for FULLTEXT indexes and tables by querying INNODB_SYS_INDEXES and INNODB_SYS_TABLES.

For more information, see Section 14.16.4, InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables.

14.6.3 Tablespaces

14.6.3 表空间(Tablespaces)

14.6.3.1 The System Tablespace
14.6.3.2 File-Per-Table Tablespaces
14.6.3.3 General Tablespaces
14.6.3.4 Undo Tablespaces
14.6.3.5 The Temporary Tablespace

This section covers topics related to InnoDB tablespaces.

本节介绍 InnoDB 表空间的相关内容。

14.6.3.1 The System Tablespace

The system tablespace is the storage area for the InnoDB data dictionary, the doublewrite buffer, the change buffer, and undo logs. It may also contain table and index data if tables are created in the system tablespace rather than file-per-table or general tablespaces.

The system tablespace can have one or more data files. By default, a single system tablespace data file, named ibdata1, is created in the data directory. The size and number of system tablespace data files is defined by the innodb_data_file_path startup option. For configuration information, see System Tablespace Data File Configuration.

Additional information about the system tablespace is provided under the following topics in the section:

Resizing the System Tablespace
Using Raw Disk Partitions for the System Tablespace

14.6.3.1 系统表空间(System Tablespace)

系统表空间是InnoDB数据字典、双写缓冲区、更改缓冲区和撤消日志的保存区域。如果在系统表空间中建了表, 它还可能包含表和索引数据。

一般我们都是在 file-per-table 或通用表空间(general tablespace)中创建表,

系统表空间可以有一个或多个数据文件。默认情况下, 会在 data 目录中创建一个名为 ibdata1 的系统表空间数据文件。系统表空间数据文件的大小和数量由 innodb_data_file_path 启动选项指定。详细配置信息请查看 System Tablespace Data File Configuration.

Resizing the System Tablespace

This section describes how to increase or decrease the size of the system tablespace.

调整系统表空间的大小

本节介绍如何增加或减少系统表空间的大小。

Increasing the Size of the System Tablespace

The easiest way to increase the size of the system tablespace is to configure it to be auto-extending. To do so, specify the autoextend attribute for the last data file in the innodb_data_file_path setting, and restart the server. For example:

增加系统表空间的大小

增加系统表空间大小, 最简单的方法是配置为自动扩展。对 innodb_data_file_path 配置项的最后一个数据文件指定 autoextend 属性, 然后重启服务器。示例:

innodb_data_file_path=ibdata1:10M:autoextend

When the autoextend attribute is specified, the data file automatically increases in size by 8MB increments as space is required. The innodb_autoextend_increment variable controls the increment size.

You can also increase system tablespace size by adding another data file. To do so:

Stop the MySQL server.
If the last data file in the innodb_data_file_path setting is defined with the autoextend attribute, remove it, and modify the size attribute to reflect the current data file size. To determine the appropriate data file size to specify, check your file system for the file size, and round that value down to the closest MB value, where a MB is equal to 1024 x 1024.
Append a new data file to the innodb_data_file_path setting, optionally specifying the autoextend attribute. The autoextend attribute can be specified only for the last data file in the innodb_data_file_path setting.
Start the MySQL server.

当指定 autoextend 属性时, 数据文件的大小会随着空间的需要, 每次自动增加 8MB。可以通过 innodb_autoextend_increment 变量来控制每次递增量的大小。

我们也可以通过添加额外的数据文件来增加系统表空间大小。操作步骤为:

停止 MySQL 服务器。 2.如果 innodb_data_file_path 设置中的最后一个数据文件配置了 autoextend 属性, 则需要删除该属性, 并修改 size 属性以反映当前数据文件的大小。要确定指定多大才合适, 请检查文件系统的文件大小, 然后将该值向下舍入到最接近的 MB 值, 其中 MB 单位取值为 1024 x 1024。
将新的数据文件加到 innodb_data_file_path 设置中, 可选指定 autoextend 属性. 需要注意的是 autoextend 属性只能指定到 innodb_data_file_path 配置中的最后一个数据文件上。
启动 MySQL 服务器。

For example, this tablespace has one auto-extending data file:

例如, 表空间中有一个自动扩展的数据文件:

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:10M:autoextend

Suppose that the data file has grown to 988MB over time. This is the innodb_data_file_path setting after modifying the size attribute to reflect the current data file size, and after specifying a new 50MB auto-extending data file:

假设数据文件随时间增长到了 988MB。那么手动增加数据文件时, 可以修改 innodb_data_file_path 配置中对应文件的 size 属性以反映当前数据文件的大小, 并指定新的数据文件大小为 50MB,支持自动扩展:

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend

When adding a new data file, do not specify an existing file name. InnoDB creates and initializes the new data file when you start the server.

添加新数据文件时, 不要指定已存在的文件名。 InnoDB 会在启动服务器时创建并初始化新的数据文件。

Note

You cannot increase the size of an existing system tablespace data file by changing its size attribute. For example, changing the innodb_data_file_path setting from ibdata1:10M:autoextend to ibdata1:12M:autoextend produces the following error when starting the server:

说明:

不能通过更改 size 属性配置来增加现有系统表空间的数据文件大小。假如将 innodb_data_file_path 配置, 从 ibdata1:10M:autoextend 更改为 ibdata1:12M:autoextend, 则启动服务器时会产生以下错误:

[ERROR] [MY-012263] [InnoDB] The Auto-extending innodb_system
data file './ibdata1' is of a different size 640 pages (rounded down to MB) than
specified in the .cnf file: initial 768 pages, max 0 (relevant if non-zero) pages!

The error indicates that the existing data file size (expressed in InnoDB pages) is different from the size specified in the configuration file. If you encounter this error, restore the previous innodb_data_file_path setting, and refer to the system tablespace resizing instructions.

InnoDB page size is defined by the innodb_page_size variable. The default is 16384 bytes.

该错误表明, 现有数据文件大小(以InnoDB页面数表示)与配置文件中指定的大小不同。如果遇到此错误,请恢复之前的 innodb_data_file_path 设置,并参考系统表空间 resizing 指示。

InnoDB 页面大小由 innodb_page_size 变量指定。默认值为 16384 字节。

Decreasing the Size of the InnoDB System Tablespace

You cannot remove a data file from the system tablespace. To decrease the system tablespace size, use this procedure:

Use mysqldump to dump all of your InnoDB tables, including InnoDB tables located in the mysql schema. Identify InnoDB tables in the mysql schema using the following query:

mysql> SELECT TABLE_NAME from INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='mysql' and ENGINE='InnoDB';
+---------------------------+
| TABLE_NAME                |
+---------------------------+
| engine_cost               |
| gtid_executed             |
| help_category             |
| help_keyword              |
| help_relation             |
| help_topic                |
| innodb_index_stats        |
| innodb_table_stats        |
| plugin                    |
| server_cost               |
| servers                   |
| slave_master_info         |
| slave_relay_log_info      |
| slave_worker_info         |
| time_zone                 |
| time_zone_leap_second     |
| time_zone_name            |
| time_zone_transition      |
| time_zone_transition_type |
+---------------------------+

Stop the server.
Remove all of the existing tablespace files (*.ibd), including the ibdata and ib_log files. Do not forget to remove *.ibd files for tables located in the mysql schema.
Remove any .frm files for InnoDB tables.
Configure the data files for the new system tablespace. See System Tablespace Data File Configuration.
Restart the server.
Import the dump files.

减小 InnoDB 系统表空间的大小

我们不能直接从系统表空间中删除数据文件。想要减小系统表空间的大小,请参考以下步骤:

使用 mysqldump 来备份(转储)所有的 InnoDB 表,包括 mysql 库中的 InnoDB 表。 mysql 库中的 InnoDB 表可以使用下面的查询语句来识别:

mysql> SELECT TABLE_NAME from INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='mysql' and ENGINE='InnoDB';
+---------------------------+
| TABLE_NAME                |
+---------------------------+
| engine_cost               |
| gtid_executed             |
| help_category             |
| help_keyword              |
| help_relation             |
| help_topic                |
| innodb_index_stats        |
| innodb_table_stats        |
| plugin                    |
| server_cost               |
| servers                   |
| slave_master_info         |
| slave_relay_log_info      |
| slave_worker_info         |
| time_zone                 |
| time_zone_leap_second     |
| time_zone_name            |
| time_zone_transition      |
| time_zone_transition_type |
+---------------------------+

停止MySQL服务器。
删除所有的表空间文件 (*.ibd), 包括 ibdata 和 ib_log 文件。以及 mysql 模式中的 *.ibd 文件。
删除InnoDB表的所有.frm文件。
为新的系统表空间配置数据文件。参见 System Tablespace Data File Configuration。
重启MySQL服务器。
导入转储文件。

Note

If your databases only use the InnoDB engine, it may be simpler to dump all databases, stop the server, remove all databases and InnoDB log files, restart the server, and import the dump files.

To avoid large system tablespaces, consider using file-per-table tablespaces for your data. File-per-table tablespaces are the default tablespace type and are used implicitly when creating an InnoDB table. Unlike the system tablespace, disk space is returned to the operating system after truncating or dropping a table created in a file-per-table tablespace. For more information, see Section 14.6.3.2, File-Per-Table Tablespaces.

提示:

如果MySQL数据库只使用了 InnoDB 引擎,则转储备份所有数据库、停止服务器、删除所有数据库和 InnoDB 日志文件、重新启动服务器并导入转储文件可能会更简单。

为避免系统表空间太大,请考虑使用 file-per-table 类型的表空间来存储数据。 File-per-table 表空间是默认的表空间类型,在创建InnoDB表时隐式使用。与系统表空间不同的是, file-per-table 表空间中创建的表, 在截断(truncate)或删除(drop)之后, 会将磁盘空间返还给操作系统。更多详细信息,请参阅 Section 14.6.3.2, File-Per-Table Tablespaces.

Using Raw Disk Partitions for the System Tablespace

You can use raw disk partitions as data files in the InnoDB system tablespace. This technique enables nonbuffered I/O on Windows and on some Linux and Unix systems without file system overhead. Perform tests with and without raw partitions to verify whether this change actually improves performance on your system.

When you use a raw disk partition, ensure that the user ID that runs the MySQL server has read and write privileges for that partition. For example, if you run the server as the mysql user, the partition must be readable and writeable by mysql. If you run the server with the --memlock option, the server must be run as root, so the partition must be readable and writeable by root.

The procedures described below involve option file modification. For additional information, see Section 4.2.2.2, Using Option Files.

使用原始磁盘分区来存储系统表空间

可以将原始磁盘分区用作 InnoDB 系统表空间中的数据文件。这种技术可以在 Windows 和某些 Linux 和 Unix 系统上实现非缓冲 I/O,从而无需文件系统开销。执行带有原始分区和不带原始分区的测试,可以验证此更改是否真正提高了系统的性能。

当使用原始磁盘分区时,请确保运行 MySQL 服务器的用户 ID 对该分区具有读写权限。例如,如果以 mysql 用户身份运行服务器,则该分区必须是用户 mysql 可读写的。如果使用 --memlock 选项启动MySQL服务器,则必须使用 root 才能启动, 那么分区必须是 root 可读和可写的。

下面描述的步骤涉及配置文件变更。更多信息请参阅 Section 4.2.2.2, Using Option Files。

Allocating a Raw Disk Partition on Linux and Unix Systems

在 Linux 和 Unix 系统上分配原始磁盘分区

When you create a new data file, specify the keyword newraw immediately after the data file size for the innodb_data_file_path option. The partition must be at least as large as the size that you specify. Note that 1MB in InnoDB is 1024 × 1024 bytes, whereas 1MB in disk specifications usually means 1,000,000 bytes.
创建新数据文件时, 在 innodb_data_file_path 选项的文件大小配置后面, 紧跟关键字 newraw。磁盘分区至少要有指定的大小或者更大。请注意, InnoDB中的 1MB 是 1024 × 1024 字节, 而磁盘规格中的 1MB 通常是指 1,000,000 字节。

[mysqld]
innodb_data_home_dir=
innodb_data_file_path=/dev/hdd1:3Gnewraw;/dev/hdd2:2Gnewraw

Restart the server. InnoDB notices the newraw keyword and initializes the new partition. However, do not create or change any InnoDB tables yet. Otherwise, when you next restart the server, InnoDB reinitializes the partition and your changes are lost. (As a safety measure InnoDB prevents users from modifying data when any partition with newraw is specified.)
重启MySQL服务器。 InnoDB 看到 newraw 关键字时会并初始化新分区。但是, 暂时不要创建或更改任何 InnoDB 表。否则, 下次重启服务器时, InnoDB 又会重新初始化分区, 导致数据丢失。 (作为一种安全措施, 当指定任何带有 newraw 的分区时, InnoDB 会阻止用户修改数据。)
After InnoDB has initialized the new partition, stop the server, change newraw in the data file specification to raw:
InnoDB初始化新分区完成后, 停止服务器, 将 data file 配置中的 newraw 更改为 raw:

[mysqld]
innodb_data_home_dir=
innodb_data_file_path=/dev/hdd1:3Graw;/dev/hdd2:2Graw

Restart the server. InnoDB now permits changes to be made.
重启MySQL服务器。现在 InnoDB 允许进行数据更改。

Allocating a Raw Disk Partition on Windows

On Windows systems, the same steps and accompanying guidelines described for Linux and Unix systems apply except that the innodb_data_file_path setting differs slightly on Windows.

在 Windows 上分配原始磁盘分区

在 Windows 系统上的操作, 和 Linux 和 Unix 系统描述的步骤基本一致, 除了 innodb_data_file_path 设置在 Windows 上略有不同。

When you create a new data file, specify the keyword newraw immediately after the data file size for the innodb_data_file_path option:
创建新数据文件时, 在 innodb_data_file_path 选项的文件大小配置后面, 紧跟关键字 newraw:

[mysqld]
innodb_data_home_dir=
innodb_data_file_path=//./D::10Gnewraw

The //./ corresponds to the Windows syntax of \\.\ for accessing physical drives. In the example above, D: is the drive letter of the partition.

其中 //./ 对应于物理驱动器的 Windows 格式 \\.\。在上面的示例中, D: 是分区的驱动器号。

Restart the server. InnoDB notices the newraw keyword and initializes the new partition.
重启MySQL服务器。 InnoDB 看到 newraw 关键字会初始化新分区。
After InnoDB has initialized the new partition, stop the server, change newraw in the data file specification to raw:
在InnoDB初始化新分区完成后, 停止服务器, 将 data file 配置中的 newraw 改为 raw:

[mysqld]
innodb_data_home_dir=
innodb_data_file_path=//./D::10Graw

Restart the server. InnoDB now permits changes to be made.
重启MySQL服务器。现在 InnoDB 允许进行数据更改。

14.6.3.2 File-Per-Table Tablespaces

A file-per-table tablespace contains data and indexes for a single InnoDB table, and is stored on the file system in its own data file.

File-per-table tablespace characteristics are described under the following topics in this section:

14.6.3.2 File-Per-Table 表空间

每个 file-per-table 表空间中包含一个 InnoDB 表的数据和索引, 并存储在文件系统的数据文件中。

File-per-table 表空间特性分为以下主题进行描述:

File-Per-Table Tablespace Configuration

InnoDB creates tables in file-per-table tablespaces by default. This behavior is controlled by the innodb_file_per_table variable. Disabling innodb_file_per_table causes InnoDB to create tables in the system tablespace.

An innodb_file_per_table setting can be specified in an option file or configured at runtime using a SET GLOBAL statement. Changing the setting at runtime requires privileges sufficient to set global system variables. See Section 5.1.8.1, System Variable Privileges.

Option file:

File-Per-Table 表空间配置

默认情况下 InnoDB 在 file-per-table 表空间中创建表。这个名字的意思就是每个表对应一个存储文件。可通过 innodb_file_per_table 变量控制此行为。禁用 innodb_file_per_table 则会导致 InnoDB 在系统表空间中创建表。

innodb_file_per_table 变量可以在配置文件中指定, 也可以在运行时通过 SET GLOBAL 语句来修改。当然, 在运行时更改设置则需要客户端用户具有足够的权限来设置全局系统变量。请参阅 Section 5.1.8.1, System Variable Privileges。

配置文件:

[mysqld]
innodb_file_per_table=ON

Using SET GLOBAL at runtime:

在运行时执行 SET GLOBAL 语句:

mysql> SET GLOBAL innodb_file_per_table=ON;

innodb_file_per_table is enabled by default in MySQL 5.6 and higher. You might consider disabling it if backward compatibility with earlier versions of MySQL is a concern.

在 MySQL 5.6 及更高版本中默认启用innodb_file_per_table 。如果和早期版本的 MySQL 不一致, 为了向后兼容性, 则可以手工禁用它。

Warning

Disabling innodb_file_per_table prevents table-copying ALTER TABLE operations from implicitly moving a table that resides in the system tablespace to a file-per-table tablespace. A table-copying ALTER TABLE operation recreates the table using the current innodb_file_per_table setting. This behavior does not apply when adding or dropping secondary indexes, nor does it apply to ALTER TABLE operations that use the INPLACE algorithm, or to tables added to the system tablespace using CREATE TABLE ... TABLESPACE or ALTER TABLE ... TABLESPACE syntax.

警告

禁用 innodb_file_per_table 可防止 ALTER TABLE 操作隐式地将表从系统表空间复制到 file-per-table 表空间。复制表的 ALTER TABLE 操作使用当前的 innodb_file_per_table 设置重新创建表。此行为不适用于添加或删除二级索引, 也不适用于使用 INPLACE 算法的 ALTER TABLE 操作, 也不适用于使用 CREATE TABLE ... TABLESPACE 或者 ALTER TABLE ... TABLESPACE 语法添加到系统表空间中的表。

File-Per-Table Tablespace Data Files

A file-per-table tablespace is created in an .idb data file in a schema directory under the MySQL data directory. The .ibd file is named for the table (*table_name*.ibd). For example, the data file for table test.t1 is created in the test directory under the MySQL data directory:

File-Per-Table 表空间的数据文件

在 MySQL 数据目录下, 对应 schema 目录中的每个 .idb 数据文件中创建一个 File-Per-Table 表空间。 .ibd 文件以表名称来命名(<table_name>.ibd)。例如, test.t1 表的数据文件是在 MySQL 数据目录下的 test 目录中创建的:

mysql> USE test;

mysql> CREATE TABLE t1 (
   id INT PRIMARY KEY AUTO_INCREMENT,
   name VARCHAR(100)
 ) ENGINE = InnoDB;

查看文件系统:

shell> cd /path/to/mysql/data/test
shell> ls
t1.ibd

You can use the DATA DIRECTORY clause of the CREATE TABLE statement to implicitly create a file-per-table tablespace data file outside of the data directory. For more information, see Section 14.6.1.2, Creating Tables Externally.

可以使用 CREATE TABLE 语句的 DATA DIRECTORY 子句在 data 目录之外隐式创建一个 file-per-table 表空间数据文件。更多信息, 请参阅 Section 14.6.1.2, Creating Tables Externally。

File-Per-Table Tablespace Advantages

File-per-table tablespaces have the following advantages over shared tablespaces such as the system tablespace or general tablespaces.

Disk space is returned to the operating system after truncating or dropping a table created in a file-per-table tablespace. Truncating or dropping a table stored in a shared tablespace creates free space within the shared tablespace data file, which can only be used for InnoDB data. In other words, a shared tablespace data file does not shrink in size after a table is truncated or dropped.
A table-copying ALTER TABLE operation on a table that resides in a shared tablespace can increase the amount of disk space occupied by the tablespace. Such operations may require as much additional space as the data in the table plus indexes. This space is not released back to the operating system as it is for file-per-table tablespaces.
TRUNCATE TABLE performance is better when executed on tables that reside in file-per-table tablespaces.
File-per-table tablespace data files can be created on separate storage devices for I/O optimization, space management, or backup purposes. See Section 14.6.1.2, Creating Tables Externally.
You can import a table that resides in a file-per-table tablespace from another MySQL instance. See Section 14.6.1.3, Importing InnoDB Tables.
Tables created in file-per-table tablespaces use the Barracuda file format. See Section 14.10, InnoDB File-Format Management. The Barracuda file format enables features associated with DYNAMIC and COMPRESSED row formats. See Section 14.11, InnoDB Row Formats.
Tables stored in individual tablespace data files can save time and improve chances for a successful recovery when data corruption occurs, when backups or binary logs are unavailable, or when the MySQL server instance cannot be restarted.
You can backup or restore tables created in file-per-table tablespaces quickly using MySQL Enterprise Backup, without interrupting the use of other InnoDB tables. This is beneficial for tables on varying backup schedules or that require backup less frequently. See Making a Partial Backup for details.
File-per-table tablespaces permit monitoring table size on the file system by monitoring the size of the tablespace data file.
Common Linux file systems do not permit concurrent writes to a single file such as a shared tablespace data file when innodb_flush_method is set to O_DIRECT. As a result, there are possible performance improvements when using file-per-table tablespaces in conjunction with this setting.
Tables in a shared tablespace are limited in size by the 64TB tablespace size limit. By comparison, each file-per-table tablespace has a 64TB size limit, which provides plenty of room for individual tables to grow in size.

File-Per-Table 表空间的优势

File-per-table 表空间与共享表空间(系统表空间/通用表空间)相比具有以下优势:

在截断或删除table时, file-per-table 表空间中创建的表, 会将磁盘空间返还给操作系统。二共享表空间中的表被截断或删除后, 在表空间数据文件中会有空闲空间, 这些空间只能用于InnoDB数据。换句话说, 共享表空间数据文件的大小, 在表被截断或删除后不会变化。
ALTER TABLE 操作对共享表空间中的表执行复制时会增加表空间占用的磁盘空间量。此类操作可能需要与表中的数据, 加上索引一样多的额外空间。该空间不会像 file-per-table 表空间一样返还给操作系统。
TRUNCATE TABLE 在 file-per-table 表空间上的执行时性能更好。
可以在单独的存储设备上创建 File-per-table 表空间数据文件, 用于 I/O 优化、空间管理或备份目的。请参阅第 14.6.1.2 节, 从外部创建表。
可以从另一个 MySQL 实例人工导入 file-per-table 表空间中的表。请参阅 Section 14.6.1.3, Importing InnoDB Tables.
在 file-per-table 表空间中创建的表, 使用 Barracuda(梭子鱼)文件格式。请参阅 Section 14.10, InnoDB File-Format Management。 Barracuda文件格式支持 DYNAMIC 和 COMPRESSED 行格式相关的功能。请参阅 Section 14.11, InnoDB Row Formats.。
当数据损坏、备份或二进制日志不可用,甚至 MySQL 服务器实例无法重启时, 存储在单个表空间数据文件中的表, 可以节省时间并提高恢复成功的概率。
您可以使用 MySQL Enterprise Backup 快速备份或恢复在 file-per-table 表空间中创建的表, 而不会中断其他InnoDB表的使用。这对于备份计划不同或需要较少备份的表很有用。有关详细信息, 请参阅 Making a Partial Backup for details.
File-per-table 表空间允许通过表空间数据文件的大小监控, 来确定其在文件系统上的大小。
当 innodb_flush_method 设置为 O_DIRECT时, 通用的Linux文件系统并不会执行并发写入。因此, 将 file-per-table 表空间与此设置结合使用时, 可能会提高性能。
共享表空间中的表, 在大小上受到 64TB 表空间大小的限制。相比之下, 每个 file-per-table 表空间的大小限制也是 64TB, 这为各个表的大小增长提供了充足的空间。

File-Per-Table Tablespace Disadvantages

File-per-table tablespaces have the following disadvantages compared to shared tablespaces such as the system tablespace or general tablespaces.

With file-per-table tablespaces, each table may have unused space that can only be utilized by rows of the same table, which can lead to wasted space if not properly managed.
fsync operations are performed on multiple file-per-table data files instead of a single shared tablespace data file. Because fsync operations are per file, write operations for multiple tables cannot be combined, which can result in a higher total number of fsync operations.
mysqld must keep an open file handle for each file-per-table tablespace, which may impact performance if you have numerous tables in file-per-table tablespaces.
More file descriptors are required when each table has its own data file.
There is potential for more fragmentation, which can impede DROP TABLE and table scan performance. However, if fragmentation is managed, file-per-table tablespaces can improve performance for these operations.
The buffer pool is scanned when dropping a table that resides in a file-per-table tablespace, which can take several seconds for large buffer pools. The scan is performed with a broad internal lock, which may delay other operations.
The innodb_autoextend_increment variable, which defines the increment size for extending the size of an auto-extending shared tablespace file when it becomes full, does not apply to file-per-table tablespace files, which are auto-extending regardless of the innodb_autoextend_increment setting. Initial file-per-table tablespace extensions are by small amounts, after which extensions occur in increments of 4MB.

File-Per-Table 表空间的不足

与系统表空间, 通用表空间相比, File-Per-Table 表空间具有以下劣势:

使用 file-per-table 表空间, 每个表都可能会存在未使用的空间, 只能由单个表使用, 如果管理不当, 可能会导致空间浪费。
fsync 操作会在多个 file-per-table 数据文件上执行, 而不是共享表空间的单个数据文件上执行。因为 fsync 操作是针对每个文件的, 所以没办法合并多个表的写操作, 这会导致 fsync 的总操作数增加。
mysqld 必须为每个 file-per-table 表空间保留一个打开的文件句柄, 如果有很多 file-per-table 表空间文件, 这可能会影响性能。
每个表都有自己的数据文件时, 就需要更多的文件描述符。
可能会出现更多磁盘碎片, 会影响 DROP TABLE 和全表扫描的性能。当然, 如果碎片是受管控的, file-per-table 表空间也能提升这些操作的性能。
当删除 file-per-table 中的表时, 会扫描缓冲池, 对于大型缓冲池来说可能需要几秒钟的时间。这个扫描操作使用广泛的内部锁执行扫描, 这可能会延迟其他操作。
innodb_autoextend_increment 变量, 定义了共享表空间文件每次自动扩展的大小, 如果用满就会再次扩大。但这个变量不适用于 file-per-table 表空间文件, 无论 innodb_autoextend_increment 设置多少。初始的 file-per-table 表空间文件很小, 之后每次扩展都以 4MB 为增量。

14.6.3.3 General Tablespaces

A general tablespace is a shared InnoDB tablespace that is created using CREATE TABLESPACE syntax. General tablespace capabilities and features are described under the following topics in this section:

14.6.3.3 通用表空间(General Tablespaces)

使用 CREATE TABLESPACE 语法可以创建一个共享的 InnoDB 表空间: 通用表空间(General Tablespace)。下面的小节分别介绍通用表空间的功能和特性:

General Tablespace Capabilities
Creating a General Tablespace
Adding Tables to a General Tablespace
General Tablespace Row Format Support
Moving Tables Between Tablespaces Using ALTER TABLE
Dropping a General Tablespace
General Tablespace Limitations

General Tablespace Capabilities

The general tablespace feature provides the following capabilities:

Similar to the system tablespace, general tablespaces are shared tablespaces that can store data for multiple tables.
General tablespaces have a potential memory advantage over file-per-table tablespaces. The server keeps tablespace metadata in memory for the lifetime of a tablespace. Multiple tables in fewer general tablespaces consume less memory for tablespace metadata than the same number of tables in separate file-per-table tablespaces.
General tablespace data files may be placed in a directory relative to or independent of the MySQL data directory, which provides you with many of the data file and storage management capabilities of file-per-table tablespaces. As with file-per-table tablespaces, the ability to place data files outside of the MySQL data directory allows you to manage performance of critical tables separately, setup RAID or DRBD for specific tables, or bind tables to particular disks, for example.
General tablespaces support both Antelope and Barracuda file formats, and therefore support all table row formats and associated features. With support for both file formats, general tablespaces have no dependence on innodb_file_format or innodb_file_per_table settings, nor do these variables have any effect on general tablespaces.
The TABLESPACE option can be used with CREATE TABLE to create tables in a general tablespaces, file-per-table tablespace, or in the system tablespace.
The TABLESPACE option can be used with ALTER TABLE to move tables between general tablespaces, file-per-table tablespaces, and the system tablespace. Previously, it was not possible to move a table from a file-per-table tablespace to the system tablespace. With the general tablespace feature, you can now do so.

通用表空间的功能特性

通用表空间支持下面这些功能特性:

与系统表空间一样，通用表空间也是为多个表存储数据的共享表空间。
通用表空间相比 file-per-table tablespaces 具有潜在的内存优势。服务器在整个生命周期内都将表空间的元数据信息保存在内存中。同样的Table数量, 与单独的 file-per-table 表空间相比，通用表空间占用的表空间元数据内存会更少。
通用表空间数据文件可以配置在独立目录中, 或者是 MySQL data 目录的相对路径下, 提供了 file-per-table tablespaces 的多数据文件和存储管理能力。与 file-per-table 表空间一样，将数据文件放在 MySQL 数据目录之外的能力允许您单独管理关键Table的性能，为特定表设置 RAID 或 DRBD，或者将表绑定到特定磁盘。
通用表空间支持 Antelope 和 Barracuda 文件格式，因此支持所有行格式和相关功能。由于支持这两种文件格式，通用表空间不依赖于 innodb_file_format 或 innodb_file_per_table 设置，这些变量对通用表空间也没有任何影响。
TABLESPACE 选项可以和 CREATE TABLE 一起使用，在通用表空间、file-per-table 表空间, 或者系统表空间中创建表。
TABLESPACE 选项可以和 ALTER TABLE 一起使用，在通用表空间、file-per-table 表空间, 或者系统表空间之间移动表。以前我们不能将表从 file-per-table 表空间移动到系统表空间。但通过使用通用表空间，就可以间接实现。

Creating a General Tablespace

General tablespaces are created using CREATE TABLESPACE syntax.

创建通用表空间

使用 CREATE TABLESPACE 语法创建通用表空间。

CREATE TABLESPACE tablespace_name
    ADD DATAFILE 'file_name'
    [FILE_BLOCK_SIZE = value]
        [ENGINE [=] engine_name]

A general tablespace can be created in the data directory or outside of it. To avoid conflicts with implicitly created file-per-table tablespaces, creating a general tablespace in a subdirectory under the data directory is not supported. When creating a general tablespace outside of the data directory, the directory must exist prior to creating the tablespace.

An .isl file is created in the MySQL data directory when a general tablespace is created outside of the MySQL data directory.

Examples:

Creating a general tablespace in the data directory:

可以在数据目录中创建, 也可以在外部创建通用表空间。为避免与隐式创建的 file-per-table 表空间冲突，不支持在数据目录下面的子目录中创建通用表空间。在数据目录之外创建通用表空间时，该目录必须在创建表空间之前就已经存在。

在 MySQL 数据目录之外创建通用表空间时，会在 MySQL 数据目录中创建一个 .isl 文件.

示例: 在数据目录中创建一个通用表空间:

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;

Creating a general tablespace in a directory outside of the data directory:

示例: 在数据目录之外的目录中创建通用表空间:

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE '/my/tablespace/directory/ts1.ibd' Engine=InnoDB;

You can specify a path that is relative to the data directory as long as the tablespace directory is not under the data directory. In this example, the my_tablespace directory is at the same level as the data directory:

只要表空间路径不在数据目录下，就可以指定相对于数据目录的路径。

示例: 数据目录处于同一级别的my_tablespace 目录中创建通用表空间:

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE '../my_tablespace/ts1.ibd' Engine=InnoDB;

Note

The ENGINE = InnoDB clause must be defined as part of the CREATE TABLESPACE statement, or InnoDB must be defined as the default storage engine (default_storage_engine=InnoDB).

说明:

必须为 CREATE TABLESPACE 语句指定 ENGINE = InnoDB 子句; 或者是将 InnoDB 定义为默认存储引擎(default_storage_engine=InnoDB).

Adding Tables to a General Tablespace

After creating an InnoDB general tablespace, you can use CREATE TABLE <tbl_name> ... TABLESPACE <tablespace_name> or ALTER TABLE <tbl_name> ... TABLESPACE <tablespace_name> to add tables to the tablespace, as shown in the following examples:

在通用表空间中创建Table

创建 InnoDB 通用表空间后，可以使用 CREATE TABLE <tbl_name> ... TABLESPACE <tablespace_name> or ALTER TABLE <tbl_name> ... TABLESPACE <tablespace_name> 在表空间建表，如下所示:

CREATE TABLE:

mysql> CREATE TABLE t1 (id INT PRIMARY KEY) TABLESPACE ts1;

ALTER TABLE:

mysql> ALTER TABLE t2 TABLESPACE ts1;

Note

Support for adding table partitions to shared tablespaces was deprecated in MySQL 5.7.24; expect it to be removed in a future version of MySQL. Shared tablespaces include the InnoDB system tablespace and general tablespaces.

For detailed syntax information, see CREATE TABLE and ALTER TABLE.

说明:

MySQL 5.7.24 开始, 已弃用了向共享表空间中添加表分区的支持；预期在 MySQL 的未来版本中会被删除。共享表空间包括InnoDB系统表空间和通用表空间。

详细的语法信息，请参考 CREATE TABLE 和 ALTER TABLE。

General Tablespace Row Format Support

General tablespaces support all table row formats (REDUNDANT, COMPACT, DYNAMIC, COMPRESSED) with the caveat that compressed and uncompressed tables cannot coexist in the same general tablespace due to different physical page sizes.

For a general tablespace to contain compressed tables (ROW_FORMAT=COMPRESSED), FILE_BLOCK_SIZE must be specified, and the FILE_BLOCK_SIZE value must be a valid compressed page size in relation to the innodb_page_size value. Also, the physical page size of the compressed table (KEY_BLOCK_SIZE) must be equal to FILE_BLOCK_SIZE/1024. For example, if innodb_page_size=16KB and FILE_BLOCK_SIZE=8K, the KEY_BLOCK_SIZE of the table must be 8.

The following table shows permitted innodb_page_size, FILE_BLOCK_SIZE, and KEY_BLOCK_SIZE combinations. FILE_BLOCK_SIZE values may also be specified in bytes. To determine a valid KEY_BLOCK_SIZE value for a given FILE_BLOCK_SIZE, divide the FILE_BLOCK_SIZE value by 1024. Table compression is not support for 32K and 64K InnoDB page sizes. For more information about KEY_BLOCK_SIZE, see CREATE TABLE, and Section 14.9.1.2, Creating Compressed Tables.

通用表空间支持的行格式

通用表空间支持所有行格式(REDUNDANT, COMPACT, DYNAMIC, COMPRESSED), 但需要注意的是，由于物理页面大小不同，在同一个通用表空间中压缩表和非压缩表不能共存。

对于包含压缩表(ROW_FORMAT=COMPRESSED)的通用表空间，必须指定FILE_BLOCK_SIZE，同时 FILE_BLOCK_SIZE 的值必须是有效压缩页面大小, 具体的值与 innodb_page_size 有关。此外，压缩表的物理页大小(KEY_BLOCK_SIZE) 必须等于 FILE_BLOCK_SIZE/1024。假如 innodb_page_size=16KB, FILE_BLOCK_SIZE=8K, 那么表的 KEY_BLOCK_SIZE 必须是 8。

下表展示了允许的 innodb_page_size, FILE_BLOCK_SIZE, 与 KEY_BLOCK_SIZE 组合。 FILE_BLOCK_SIZE 值也可以以字节为单位指定。要确定给定 FILE_BLOCK_SIZE 的有效 KEY_BLOCK_SIZE 值，请将 FILE_BLOCK_SIZE 值除以 1024。压缩表不支持 32K 和 64K 的 InnoDB 页面大小。有关 KEY_BLOCK_SIZE 的更多信息，请参阅 CREATE TABLE 以及 Section 14.9.1.2, Creating Compressed Tables。

Table 14.3 Permitted Page Size, FILE_BLOCK_SIZE, and KEY_BLOCK_SIZE Combinations for Compressed Tables

表14.3 : 压缩表, 允许的页面大小、FILE_BLOCK_SIZE 和 KEY_BLOCK_SIZE 组合关系

InnoDB页面大小(innodb_page_size)	允许的 FILE_BLOCK_SIZE 值	允许的 KEY_BLOCK_SIZE 值
64KB	64K (65536)	不支持压缩表
32KB	32K (32768)	不支持压缩表
16KB	16K (16384)	N/A: 如果 `innodb_page_size` 等于 `FILE_BLOCK_SIZE`, 则该表空间不支持压缩表.
16KB	8K (8192)	8
16KB	4K (4096)	4
16KB	2K (2048)	2
16KB	1K (1024)	1
8KB	8K (8192)	N/A: 如果 `innodb_page_size` 等于 `FILE_BLOCK_SIZE`, 则该表空间不支持压缩表.
8KB	4K (4096)	4
8KB	2K (2048)	2
8KB	1K (1024)	1
4KB	4K (4096)	N/A: 如果 `innodb_page_size` 等于 `FILE_BLOCK_SIZE`, 则该表空间不支持压缩表.
4K	2K (2048)	2
4KB	1K (1024)	1

This example demonstrates creating a general tablespace and adding a compressed table. The example assumes a default innodb_page_size of 16KB. The FILE_BLOCK_SIZE of 8192 requires that the compressed table have a KEY_BLOCK_SIZE of 8.

下面通过示例来演示创建通用表空间, 添加压缩表。假定 innodb_page_size 默认为 16KB. FILE_BLOCK_SIZE值为8192, 要求压缩表的 KEY_BLOCK_SIZE 等于 8.

mysql> CREATE TABLESPACE `ts2` ADD DATAFILE 'ts2.ibd' FILE_BLOCK_SIZE = 8192 Engine=InnoDB;

mysql> CREATE TABLE t4 (id INT PRIMARY KEY) TABLESPACE ts2 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

If you do not specify FILE_BLOCK_SIZE when creating a general tablespace, FILE_BLOCK_SIZE defaults to innodb_page_size. When FILE_BLOCK_SIZE is equal to innodb_page_size, the tablespace may only contain tables with an uncompressed row format (COMPACT, REDUNDANT, and DYNAMIC row formats).

如果通用表空间在创建时不指定 FILE_BLOCK_SIZE, 则FILE_BLOCK_SIZE 默认等于 innodb_page_size, 那么这个表空间就只可以包含非压缩行格式的表(COMPACT, REDUNDANT, and DYNAMIC 行格式).

Moving Tables Between Tablespaces Using ALTER TABLE

You can use ALTER TABLE with the TABLESPACE option to move a table to an existing general tablespace, to a new file-per-table tablespace, or to the system tablespace.

使用 ALTER TABLE 在表空间之间移动表

可以使用带有 TABLESPACE 选项的 ALTER TABLE 语句, 将表移动到:

已有的通用表空间
新的 file-per-table 表空间
系统表空间。

Note

Support for placing table partitions in shared tablespaces was deprecated in MySQL 5.7.24; expect it to be removed in a future version of MySQL. Shared tablespaces include the InnoDB system tablespace and general tablespaces.

To move a table from a file-per-table tablespace or from the system tablespace to a general tablespace, specify the name of the general tablespace. The general tablespace must exist. See CREATE TABLESPACE for more information.

说明:

MySQL 5.7.24 开始, 已弃用了向共享表空间中添加表分区的支持；预期在 MySQL 的未来版本中会被删除。共享表空间包括InnoDB系统表空间和通用表空间。

ALTER TABLE tbl_name TABLESPACE [=] tablespace_name;

To move a table from a general tablespace or file-per-table tablespace to the system tablespace, specify innodb_system as the tablespace name.

要将一个表移动到系统表空间，指定表空间为 innodb_system 即可。

ALTER TABLE tbl_name TABLESPACE [=] innodb_system;

To move a table from the system tablespace or a general tablespace to a file-per-table tablespace, specify innodb_file_per_table as the tablespace name.

要将一个表移动到 file-per-table 空间，指定表空间为 innodb_file_per_table 即可。

ALTER TABLE tbl_name TABLESPACE [=] innodb_file_per_table;

ALTER TABLE ... TABLESPACE operations always cause a full table rebuild, even if the TABLESPACE attribute has not changed from its previous value.

ALTER TABLE ... TABLESPACE syntax does not support moving a table from a temporary tablespace to a persistent tablespace.

The DATA DIRECTORY clause is permitted with CREATE TABLE ... TABLESPACE=innodb_file_per_table but is otherwise not supported for use in combination with the TABLESPACE option.

Restrictions apply when moving tables from encrypted tablespaces. See Encryption Limitations.

ALTER TABLE ... TABLESPACE 操作总是会导致完整的表重建, 即使 TABLESPACE 属性和之前相比没有变化。

ALTER TABLE ... TABLESPACE 语法不支持将一个表从临时表空间, 移动到持久表空间。

DATA DIRECTORY 子句允许和 CREATE TABLE ... TABLESPACE=innodb_file_per_table 一起使用，但不支持与其他值的 TABLESPACE 选项结合使用。

从加密表空间移动表时有限制。请参阅: Encryption Limitations。

Dropping a General Tablespace

The DROP TABLESPACE statement is used to drop an InnoDB general tablespace.

All tables must be dropped from the tablespace prior to a DROP TABLESPACE operation. If the tablespace is not empty, DROP TABLESPACE returns an error.

删除某个通用表空间

DROP TABLESPACE 语句可以用来删除 InnoDB 通用表空间。

在删除某个 InnoDB 通用表空间操作之前, 必须先删除掉里面的所有表。如果表空间不为空，则 DROP TABLESPACE 会报错。

Use a query similar to the following to identify tables in a general tablespace.

使用类似下面的SQL语句来查询通用表空间中的表。

mysql> SELECT a.NAME AS space_name, b.NAME AS table_name FROM INFORMATION_SCHEMA.INNODB_TABLESPACES a,
       INFORMATION_SCHEMA.INNODB_TABLES b WHERE a.SPACE=b.SPACE AND a.NAME LIKE 'ts1';
+------------+------------+
| space_name | table_name |
+------------+------------+
| ts1        | test/t1    |
| ts1        | test/t2    |
| ts1        | test/t3    |
+------------+------------+

If a DROP TABLESPACE operation on an empty general tablespace returns an error, the tablespace may contain an orphan temporary or intermediate table that was left by an ALTER TABLE operation that was interrupted by a server exit. For more information, see Section 14.22.3, Troubleshooting InnoDB Data Dictionary Operations.

A general InnoDB tablespace is not deleted automatically when the last table in the tablespace is dropped. The tablespace must be dropped explicitly using `DROP TABLESPACE <tablespace_name>.

A general tablespace does not belong to any particular database. A DROP DATABASE operation can drop tables that belong to a general tablespace but it cannot drop the tablespace, even if the DROP DATABASE operation drops all tables that belong to the tablespace. A general tablespace must be dropped explicitly using DROP TABLESPACE *tablespace_name*.

如果对某个空的通用表空间执行 DROP TABLESPACE 操作却返回错误，那么原因可能是, 由于服务器退出而导致了 ALTER TABLE 操作中断, 残留下来某些孤立的临时表或中间表，还存在于表空间中。更多信息，请参阅 Section 14.22.3, Troubleshooting InnoDB Data Dictionary Operations.

删除表空间中的最后一个表时，并不会连带删除通用表空间。必须使用 `DROP TABLESPACE <tablespace_name> 明确删除表空间。

通用表空间不属于任何特定数据库。 DROP DATABASE 操作可以删除属于通用表空间的表，但不能删除表空间，即使如果 DROP DATABASE 操作删除了属于该表空间的所有表。必须使用 `DROP TABLESPACE <tablespace_name> 明确删除表空间。

Similar to the system tablespace, truncating or dropping tables stored in a general tablespace creates free space internally in the general tablespace .ibd data file which can only be used for new InnoDB data. Space is not released back to the operating system as it is when a file-per-table tablespace is deleted during a DROP TABLE operation.

This example demonstrates how to drop an InnoDB general tablespace. The general tablespace ts1 is created with a single table. The table must be dropped before dropping the tablespace.

与系统表空间类似，截断或删除存储在通用表空间中的表, 会在通用表空间 .ibd data file 中产生空闲区域, 而个文件又只能用于新的 InnoDB 数据。在 DROP TABLE 操作期间释放的空间, 不会将对应的磁盘空间返还给操作系统, 和 file-per-table 表空间不一样。

下面通过示例来演示如何删除InnoDB通用表空间。通用表空间 ts1 中包含了单个表。必须在删除表空间之前先删除该表。

mysql> CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;

mysql> CREATE TABLE t1 (id INT PRIMARY KEY) TABLESPACE ts1 Engine=InnoDB;

mysql> DROP TABLE t1;

mysql> DROP TABLESPACE ts1;

Note

tablespace_name is a case-sensitive identifier in MySQL.

说明

<tablespace_name> 标识符在 MySQL 中是区分大小写的。

General Tablespace Limitations

A generated or existing tablespace cannot be changed to a general tablespace.
Creation of temporary general tablespaces is not supported.
General tablespaces do not support temporary tables.
Tables stored in a general tablespace may only be opened in MySQL releases that support general tablespaces.
Similar to the system tablespace, truncating or dropping tables stored in a general tablespace creates free space internally in the general tablespace .ibd data file which can only be used for new InnoDB data. Space is not released back to the operating system as it is for file-per-table tablespaces.

Additionally, a table-copying ALTER TABLE operation on table that resides in a shared tablespace (a general tablespace or the system tablespace) can increase the amount of space used by the tablespace. Such operations require as much additional space as the data in the table plus indexes. The additional space required for the table-copying ALTER TABLE operation is not released back to the operating system as it is for file-per-table tablespaces.

ALTER TABLE ... DISCARD TABLESPACE and ALTER TABLE ...IMPORT TABLESPACE are not supported for tables that belong to a general tablespace.
Support for placing table partitions in general tablespaces was deprecated in MySQL 5.7.24; expect it to be removed in a future version of MySQL.
An ADD DATAFILE clause is not supported in a replication environment where the source and replica reside on the same host, as it would cause the source and replica to create a tablespace of the same name in the same location.

通用表空间的限制

不能将某个已经生成或者已经存在的表空间更改为通用表空间。
不支持创建临时的通用表空间。
通用表空间不支持创建临时表。
存储在通用表空间中的表, 只能在支持通用表空间的 MySQL 版本中打开。
与系统表空间类似，截断或删除存储在通用表空间中的表, 会在通用表空间 .ibd data file 中产生空闲区域, 这些空间只能用于存储新的 InnoDB 数据。在 DROP TABLE 操作期间释放的空间, 不会将对应的磁盘空间返还给操作系统, 和 file-per-table 表空间不一样。

此外，对位于共享表空间(通用表空间或系统表空间)中的表, 执行表复制 ALTER TABLE 操作, 会增加表空间使用的磁盘空间。此类操作需要与表中的数据加上索引一样多的额外空间。表复制 ALTER TABLE 操作所需的额外空间不会像 file-per-table 表空间那样释放给操作系统。

属于通用表空间的表, 不支持 ALTER TABLE ... DISCARD TABLESPACE 和 ALTER TABLE ...IMPORT TABLESPACE。
从 MySQL 5.7.24 开始不支持在通用表空间中放置表分区；预期在未来版本的 MySQL 中会删除。
如果主从复制的源和副本实例都位于同一宿主机, 则不支持 ADD DATAFILE 子句，因为它会导致源和副本在同一位置创建同名的表空间。

14.6.3.4 Undo Tablespaces

Undo tablespaces contain undo logs, which are collections of undo log records that contain information about how to undo the latest change by a transaction to a clustered index record. Undo logs exist within undo log segments, which are contained within rollback segments. The innodb_rollback_segments variable defines the number of rollback segments allocated to each undo tablespace.

Undo logs can be stored in one or more undo tablespaces instead of the system tablespace. This layout differs from the default configuration in which undo logs reside in the system tablespace. The I/O patterns for undo logs make undo tablespaces good candidates for SSD storage, while keeping the system tablespace on hard disk storage.

The number of undo tablespaces used by InnoDB is controlled by the innodb_undo_tablespaces configuration option. This option can only be configured when initializing the MySQL instance. It cannot be changed afterward.

14.6.3.4 撤消日志表空间(Undo tablespaces)

撤消日志表空间(Undo tablespaces)中保存了撤消日志(undo logs)，撤消日志包含了如何撤消事务对聚集索引记录所做的最新更改。撤消日志存在于撤消日志段(undo log segments)中，而撤消日志段又包含在回滚段中。 innodb_rollback_segments 变量定义了分配给每个撤消表空间的回滚段数量。

Undo logs 可以存储在一个或多个undo tablespaces中, 而不一定要保存在系统表空间中。这种布局不同于将撤消日志驻留在默认的系统表空间中。撤消日志的 I/O 模式, 让 SSD 存储设备成为撤消表空间的最佳选择, 同时系统表空间还可以保持在硬盘存储中。

InnoDB 使用的撤消表空间数量由 innodb_undo_tablespaces 配置项控制。该选项只能在初始化 MySQL 实例时配置。运行时无法动态更改。

Note

The innodb_undo_tablespaces configuration option is deprecated; expect it to be removed in a future release.

Undo tablespaces and individual segments inside those tablespaces cannot be dropped. However, undo logs stored in undo tablespaces can be truncated. For more information, see Truncating Undo Tablespaces.

提示

innodb_undo_tablespaces 配置选项已经废弃；预期在未来版本中会被删除。

撤消表空间和这些表空间内的单个 segments 不能被删除。但是可以截断存储在撤消表空间中的撤消日志。详细信息请参阅 Truncating Undo Tablespaces。

Configuring Undo Tablespaces

To configure undo tablespaces for a MySQL instance, perform the following steps. It is assumed that you are performing the procedure on a test instance prior to deploying the configuration to a production system.

Important

The number of undo tablespaces can only be configured when initializing a MySQL instance and is fixed for the life of the instance.

配置撤消表空间

要为 MySQL 实例配置撤消表空间，请执行以下步骤。我们应该先在测试实例上执行该过程, 没问题再部署到生产系统。

重要提示

undo表空间的数量只能在初始化 MySQL 实例时配置，并且在实例的生命周期内是固定的。

Specify a directory location for undo tablespaces using the innodb_undo_directory configuration option. If a directory location is not specified, undo tablespaces are created in the data directory.
Define the number of rollback segments using the innodb_rollback_segments configuration option. Start with a relatively low value and increase it incrementally over time to examine the effect on performance. The default setting for innodb_rollback_segments is 128, which is also the maximum value.

One rollback segment is always assigned to the system tablespace, and 32 rollback segments are reserved for the temporary tablespace (ibtmp1). Therefore, to allocate rollback segments to undo tablespaces, set innodb_rollback_segments to a value greater than 33. For example, if you have two undo tablespaces, set innodb_rollback_segments to 35 to assign one rollback segment to each of the two undo tablespaces. Rollback segments are distributed among undo tablespaces in a circular fashion.

When you configure separate undo tablespaces, the rollback segment in the system tablespace is rendered inactive.

使用 innodb_undo_directory 配置项指定撤消表空间的目录位置。如果未指定目录位置，则在数据目录中创建撤消表空间。
使用 innodb_rollback_segments 配置项定义回滚段的数量。可以从一个相对较低的值开始，随着时间的推移逐渐增加它的值, 以检查对性能的影响。 innodb_rollback_segments 的默认设置是128，这也是最大值。

一个回滚段总是分配给系统表空间，32 个回滚段保留给临时表空间(ibtmp1)。因此，要将回滚段分配给撤消表空间，请将 innodb_rollback_segments 设置为大于 33 的值。例如，如果有两个 undo 表空间，请将 innodb_rollback_segments 设置为 35, 每个撤消表空间分配一个回滚段(rollback segment)。回滚段以循环方式分布在撤消表空间中。

配置单独的撤消表空间时，系统表空间中的回滚段将呈现为非活动状态。

Define the number of undo tablespaces using the innodb_undo_tablespaces option. The specified number of undo tablespaces is fixed for the life of the MySQL instance, so if you are uncertain about an optimal value, estimate on the high side.
Create a new MySQL test instance using the option values you have chosen.
Use a realistic workload on your test instance with data volume similar to your production servers to test the configuration.
Benchmark the performance of I/O intensive workloads.
Periodically increase the value of innodb_rollback_segments and rerun performance tests until there are no further improvements in I/O performance.
使用 innodb_undo_tablespaces 选项定义撤消表空间的数量。指定的撤消表空间数量在 MySQL 实例的生命周期内是固定的，因此如果不确定最佳值，请稍微设置大一点。
使用选择的值创建一个新的 MySQL 测试实例。
在测试实例上使用与生产服务器相似的数据量和工作负载来测试配置。
对 I/O 密集型工作负载的性能进行基准测试。
定期增加 innodb_rollback_segments 的值并重新运行性能测试，直到 I/O 性能没有进一步的改善。

Truncating Undo Tablespaces

Truncating undo tablespaces requires that the MySQL instance have a minimum of two active undo tablespaces, which ensures that one undo tablespace remains active while the other is taken offline to be truncated. The number of undo tablespaces is defined by the innodb_undo_tablespaces variable. The default value is 0. Use this statement to check the value of innodb_undo_tablespaces:

截断Undo表空间

如果要截断Undo表空间, 那么 MySQL 实例至少要有两个活动的 undo 表空间, 才能确保有一个 undo 表空间保持活动状态，而另一个脱机以被截断。 Undo表空间的数量由 innodb_undo_tablespaces 变量指定。默认值为 0。可以使用下面的SQL语句来查看 innodb_undo_tablespaces 的值:

mysql> SELECT @@innodb_undo_tablespaces;
+---------------------------+
| @@innodb_undo_tablespaces |
+---------------------------+
|                         2 |
+---------------------------+

To have undo tablespaces truncated, enable the innodb_undo_log_truncate variable. For example:

要截断undo表空间，请启用 innodb_undo_log_truncate 变量。例如这样：

mysql> SET GLOBAL innodb_undo_log_truncate=ON;

When the innodb_undo_log_truncate variable is enabled, undo tablespaces that exceed the size limit defined by the innodb_max_undo_log_size variable are subject to truncation. The innodb_max_undo_log_size variable is dynamic and has a default value of 1073741824 bytes (1024 MiB).

当启用 innodb_undo_log_truncate 变量时， undo tablespaces 中超过 innodb_max_undo_log_size 变量大小的部分会被截断。 innodb_max_undo_log_size 变量是动态的，默认值为 1073741824 字节 (1024 MiB).

mysql> SELECT @@innodb_max_undo_log_size;
+----------------------------+
| @@innodb_max_undo_log_size |
+----------------------------+
|                 1073741824 |
+----------------------------+

When the innodb_undo_log_truncate variable is enabled:

Undo tablespaces that exceed the innodb_max_undo_log_size setting are marked for truncation. Selection of an undo tablespace for truncation is performed in a circular fashion to avoid truncating the same undo tablespace each time.
Rollback segments residing in the selected undo tablespace are made inactive so that they are not assigned to new transactions. Existing transactions that are currently using rollback segments are permitted to finish.
The purge system empties rollback segments by freeing undo logs that are no longer in use.
After all rollback segments in the undo tablespace are freed, the truncate operation runs and truncates the undo tablespace to its initial size. The initial size of an undo tablespace depends on the innodb_page_size value. For the default 16KB page size, the initial undo tablespace file size is 10MiB. For 4KB, 8KB, 32KB, and 64KB page sizes, the initial undo tablespace files sizes are 7MiB, 8MiB, 20MiB, and 40MiB, respectively.

The size of an undo tablespace after a truncate operation may be larger than the initial size due to immediate use following the completion of the operation.

The innodb_undo_directory variable defines the location of undo tablespace files. If the innodb_undo_directory variable is undefined, undo tablespaces reside in the data directory.

Rollback segments are reactivated so that they can be assigned to new transactions.

当innodb_undo_log_truncate 变量是启用状态时：

超过 innodb_max_undo_log_size 设置的undo表空间会被标记来截断。用于截断的撤消表空间以循环方式来选择，以避免每次都截断相同的撤消表空间。
在选定撤消表空间中的回滚段处于非活动状态，因此它们不会分配给新事务。当前正在使用回滚段的现有事务被允许完成。
purge 系统通过释放不再使用的撤消日志来清空回滚段。
undo 表空间中的所有回滚段都被释放后，truncate 操作运行并将 undo 表空间截断为其初始大小。撤消表空间的初始大小取决于 innodb_page_size 值。对于默认的 16KB 页大小，初始撤消表空间文件大小为 10MiB。对于 4KB、8KB、32KB 和 64KB 页大小，初始撤消表空间文件大小分别为 7MiB、8MiB、20MiB 和 40MiB。

由于在操作完成后立即使用，截断操作后撤消表空间的大小可能大于初始大小。

innodb_undo_directory 变量定义了撤消表空间的文件路径。如果 innodb_undo_directory 变量未定义，撤消表空间存放在数据目录中。

回滚段被重新激活，以便它们可以分配给新事务。

Expediting Truncation of Undo Tablespaces

The purge thread is responsible for emptying and truncating undo tablespaces. By default, the purge thread looks for undo tablespaces to truncate once every 128 times that purge is invoked. The frequency with which the purge thread looks for undo tablespaces to truncate is controlled by the innodb_purge_rseg_truncate_frequency variable, which has a default setting of 128.

加快截断撤消表空间的频率

清除线程(purge thread)负责清空和截断撤消表空间。默认情况下，清除线程每执行 128 次清除就会查找撤消表空间以截断一次。由 innodb_purge_rseg_truncate_frequency 变量控制清除线程查找撤消表空间截断的频率，默认设置为 128.

mysql> SELECT @@innodb_purge_rseg_truncate_frequency;
+----------------------------------------+
| @@innodb_purge_rseg_truncate_frequency |
+----------------------------------------+
|                                    128 |
+----------------------------------------+

To increase that frequency, decrease the innodb_purge_rseg_truncate_frequency setting. For example, to have the purge thread look for undo tabespaces once every 32 timees that purge is invoked, set innodb_purge_rseg_truncate_frequency to 32.

要增加扫描频率，可以减小 innodb_purge_rseg_truncate_frequency 的值。例如 32。

mysql> SET GLOBAL innodb_purge_rseg_truncate_frequency=32;

When the purge thread finds an undo tablespace that requires truncation, the purge thread returns with increased frequency to quickly empty and truncate the undo tablespace.

当清除线程发现需要截断的撤消表空间时, 清除线程以增加的频率返回, 快速清空并截断撤消表空间。

Performance Impact of Truncating Undo Tablespace Files

When an undo tablespace is truncated, the rollback segments in the undo tablespace are deactivated. The active rollback segments in other undo tablespaces assume responsibility for the entire system load, which may result in a slight performance degradation. The amount of performance degradation depends on a number of factors:

截断撤消表空间文件对系统性能的影响

当撤消表空间被截断时，其中的回滚段被停用。其他撤消表空间中的活动回滚段负责承担整个系统负载，这可能会造成轻微的性能下降。性能下降的程度取决于多种因素：

Number of undo tablespaces
Number of undo logs
Undo tablespace size
Speed of the I/O susbsystem
Existing long running transactions
System load
撤消表空间的数量
撤消日志的数量
撤消表空间的大小
I/O子系统的速度
是否存在长时间不提交的事务
当前的系统负载压力

The easiest way to avoid this potential performance issue is to increase the number of undo tablespaces.

Also, two checkpoint operations are performed during an undo tablespace truncate operation. The first checkpoint operation removes the old undo tablespace pages from the buffer pool. The second checkpoint flushes the initial pages of the new undo tablespace to disk. On a busy system, the first checkpoint in particular can temporarily affect system performance if there is a large number of pages to remove.

避免这种潜在性能问题的最简单方法, 就是增加撤消表空间的数量。

此外，在撤消表空间截断操作期间会执行两次检查点操作。第一次检查点操作从缓冲池中删除旧的撤消表空间页面。第二次检查点将新的撤消表空间的初始页面刷到磁盘。在高负载的繁忙系统上，如果要删除大量页面，特别是第一次检查点操作, 可能会暂时影响系统性能。

Undo Tablespace Truncation Recovery

An undo tablespace truncate operation creates a temporary undo_<space_number>_trunc.log file in the server log directory. That log directory is defined by innodb_log_group_home_dir. If a system failure occurs during the truncate operation, the temporary log file permits the startup process to identify undo tablespaces that were being truncated and to continue the operation.

Undo表空间截断与故障恢复

撤消表空间的截断操作会在服务器日志目录中创建一个临时的 undo_<space_number>_trunc.log 文件。该日志文件的目录由 innodb_log_group_home_dir 定义。如果在截断操作期间发生系统故障，启动进程可以通过临时日志文件, 识别被截断的撤消表空间, 并继续操作。

14.6.3.5 The Temporary Tablespace

Non-compressed, user-created temporary tables and on-disk internal temporary tables are created in a shared temporary tablespace. The innodb_temp_data_file_path configuration option defines the relative path, name, size, and attributes for temporary tablespace data files. If no value is specified for innodb_temp_data_file_path, the default behavior is to create an auto-extending data file named ibtmp1 in the innodb_data_home_dir directory that is slightly larger than 12MB.

14.6.3.5 临时表空间

非压缩的、用户创建的临时表, 以及磁盘内部临时表, 都是在共享临时表空间中创建的。 innodb_temp_data_file_path 配置项定义了临时表空间数据文件的相对路径、名称、大小和属性。如果没有指定值，则默认行为是在 innodb_data_home_dir 目录中创建名为 ibtmp1 的自动扩展数据文件, 初始大小在 12MB 多一点点。

Note

In MySQL 5.6, non-compressed temporary tables are created in individual file-per-table tablespaces in the temporary file directory, or in the InnoDB system tablespace in the data directory if innodb_file_per_table is disabled. The introduction of a shared temporary tablespace in MySQL 5.7 removes performance costs associated with creating and removing a file-per-table tablespace for each temporary table. A dedicated temporary tablespace also means that it is no longer necessary to save temporary table metadata to the InnoDB system tables.

提示

在 MySQL 5.6 中，非压缩临时表在临时文件目录的单个 file-per-table 表空间中创建，如果 innodb_file_per_table 被禁用则在数据目录下的 InnoDB 系统表空间中创建。 MySQL 5.7 中则引入共享临时表空间, 消除了为每个临时表创建和删除 file-per-table 表空间相关的性能开销。专用的临时表空间还意味着不需要将临时表元数据保存到InnoDB系统表中。

Compressed temporary tables, which are temporary tables created using the ROW_FORMAT=COMPRESSED attribute, are created in file-per-table tablespaces in the temporary file directory.

The temporary tablespace is removed on normal shutdown or on an aborted initialization, and is recreated each time the server is started. The temporary tablespace receives a dynamically generated space ID when it is created. Startup is refused if the temporary tablespace cannot be created. The temporary tablespace is not removed if the server halts unexpectedly. In this case, a database administrator can remove the temporary tablespace manually or restart the server, which removes and recreates the temporary tablespace automatically.

The temporary tablespace cannot reside on a raw device.

INFORMATION_SCHEMA.FILES provides metadata about the InnoDB temporary tablespace. Issue a query similar to this one to view temporary tablespace metadata:

压缩临时表，即使用 ROW_FORMAT=COMPRESSED 属性创建的临时表，在临时文件目录下的 file-per-table 表空间中创建。

临时表空间在系统正常关闭或初始化中止时被删除，并在每次服务器启动时重新创建。临时表空间在创建时会收到一个动态生成的表空间 ID。如果无法创建临时表空间，则拒绝启动。如果服务器意外停止，则不会删除临时表空间。在这种情况下，DBA 可以手动删除临时表空间, 或者重启服务器，重启时服务器会自动删除并重新创建临时表空间。

临时表空间不能放置在原始磁盘设备上(也就是说必须使用文件系统)。

INFORMATION_SCHEMA.FILES 提供了 InnoDB 临时表空间相关的元数据。可以使用下面这样的SQL语句来查询临时表空间的元数据：

mysql> SELECT * FROM INFORMATION_SCHEMA.FILES WHERE TABLESPACE_NAME='innodb_temporary'\G

INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO provides metadata about user-created temporary tables that are currently active within an InnoDB instance. For more information, see Section 14.16.7, InnoDB INFORMATION_SCHEMA Temporary Table Info Table.

INFORMATION_SCHEMA.INNODB_TEMP_TABLE_INFO 提供了当前 InnoDB 实例中处于活动状态的, 用户创建的临时表的元数据信息。更多信息请参阅 Section 14.16.7, InnoDB INFORMATION_SCHEMA Temporary Table Info Table.

Managing Temporary Tablespace Data File Size

By default, the temporary tablespace data file is autoextending and increases in size as necessary to accommodate on-disk temporary tables. For example, if an operation creates a temporary table that is 20MB in size, the temporary tablespace data file, which is 12MB in size by default when created, extends in size to accommodate it. When temporary tables are dropped, freed space can be reused for new temporary tables, but the data file remains at the extended size.

An autoextending temporary tablespace data file can become large in environments that use large temporary tables or that use temporary tables extensively. A large data file can also result from long running queries that use temporary tables.

To determine if a temporary tablespace data file is autoextending, check the innodb_temp_data_file_path setting:

管理临时表空间的数据文件大小

默认情况下，临时表空间的数据文件会自动扩展,根据需要增加大小以适应磁盘上的临时表。例如，如果某个操作创建了一个大小为 20MB 的临时表，则默认大小为 12MB 的临时表空间数据文件会自动扩展, 以容纳这些数据。删除临时表后，释放的空间可用于新的临时表，但数据文件仍保持扩展后的大小。

如果有很大的临时表, 或者使用了很多临时表，那么自动扩展的临时表空间数据文件可能会变得很大。大数据文件也可能来自于使用临时表的SQL查询。

要确定临时表空间的数据文件是否正在自动扩展，请检查 innodb_temp_data_file_path 设置：

mysql> SELECT @@innodb_temp_data_file_path;
+------------------------------+
| @@innodb_temp_data_file_path |
+------------------------------+
| ibtmp1:12M:autoextend        |
+------------------------------+

To check the size of temporary tablespace data files, query the INFORMATION_SCHEMA.FILES table using a query similar to this:

要检查临时表空间的数据文件大小，请使用下面这样的SQL来查询 INFORMATION_SCHEMA.FILES 表：

mysql> SELECT FILE_NAME, TABLESPACE_NAME, ENGINE, INITIAL_SIZE, TOTAL_EXTENTS*EXTENT_SIZE
       AS TotalSizeBytes, DATA_FREE, MAXIMUM_SIZE FROM INFORMATION_SCHEMA.FILES
       WHERE TABLESPACE_NAME = 'innodb_temporary'\G
*************************** 1. row ***************************
      FILE_NAME: ./ibtmp1
TABLESPACE_NAME: innodb_temporary
         ENGINE: InnoDB
   INITIAL_SIZE: 12582912
 TotalSizeBytes: 12582912
      DATA_FREE: 6291456
   MAXIMUM_SIZE: NULL

The TotalSizeBytes value reports the current size of the temporary tablespace data file. For information about other field values, see Section 23.9, The INFORMATION_SCHEMA FILES Table.

Alternatively, check the temporary tablespace data file size on your operating system. By default, the temporary tablespace data file is located in the directory defined by the innodb_temp_data_file_path configuration option. If a value was not specified for this option explicitly, a temporary tablespace data file named ibtmp1 is created in innodb_data_home_dir, which defaults to the MySQL data directory if unspecified.

To reclaim disk space occupied by a temporary tablespace data file, restart the MySQL server. Restarting the server removes and recreates the temporary tablespace data file according to the attributes defined by innodb_temp_data_file_path.

To prevent the temporary data file from becoming too large, you can configure the innodb_temp_data_file_path option to specify a maximum file size. For example:

其中 TotalSizeBytes 值表示当前临时表空间数据文件的大小。其他字段值的信息请参阅 Section 23.9, The INFORMATION_SCHEMA FILES Table.

或者，通过操作系统来查看临时表空间数据文件的大小。默认情况下，临时表空间数据文件位于 innodb_temp_data_file_path 配置项指定的目录中. 如果未明确指定此选项的值，则会在 innodb_data_home_dir 中创建一个名为 ibtmp1 的临时表空间数据文件，默认是 MySQL 的数据目录。

要回收临时表空间数据文件占用的磁盘空间，需要重启 MySQL 服务器。服务器重启时会根据 innodb_temp_data_file_path 中指定的属性, 删除并重新创建临时表空间数据文件。

为了防止临时数据文件变得太大，可以在 innodb_temp_data_file_path 中指定文件的最大值。例如：

[mysqld]
innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:500M

When the data file reaches the maximum size, queries fail with an error indicating that the table is full. Configuring innodb_temp_data_file_path requires restarting the server.

Alternatively, configure the default_tmp_storage_engine and internal_tmp_disk_storage_engine options, which define the storage engine to use for user-created and on-disk internal temporary tables, respectively. Both options are set to InnoDB by default. The MyISAM storage engine uses an individual file for each temporary table, which is removed when the temporary table is dropped.

当数据文件达到最大值，查询会报错，提示表已满。修改 innodb_temp_data_file_path 配置后, 需要重启MySQL服务器。

或者，配置 default_tmp_storage_engine 和 internal_tmp_disk_storage_engine 选项，分别定义用户创建的临时表引擎, 以及磁盘内部临时表的存储引擎。默认情况下，这两个选项都设置为InnoDB。而 MyISAM 存储引擎则是为每个临时表使用一个单独的文件，当临时表被删除时，该文件也会被删除。

14.6.4 InnoDB Data Dictionary

The InnoDB data dictionary is comprised of internal system tables that contain metadata used to keep track of objects such as tables, indexes, and table columns. The metadata is physically located in the InnoDB system tablespace. For historical reasons, data dictionary metadata overlaps to some degree with information stored in InnoDB table metadata files (.frm files).

14.6.4 InnoDB 数据字典(InnoDB Data Dictionary)

InnoDB数据字典由内部系统表组成，这些表包含用于跟踪对象的元数据, 这些对象包括表、索引和列等等。元数据物理上位于 InnoDB 系统表空间中。由于历史原因，数据字典元数据,在某种程度上与存储在 InnoDB 表元数据文件(.frm文件)中的信息重叠。

14.6.5 Doublewrite Buffer

The doublewrite buffer is a storage area where InnoDB writes pages flushed from the buffer pool before writing the pages to their proper positions in the InnoDB data files. If there is an operating system, storage subsystem, or unexpected mysqld process exit in the middle of a page write, InnoDB can find a good copy of the page from the doublewrite buffer during crash recovery.

Although data is written twice, the doublewrite buffer does not require twice as much I/O overhead or twice as many I/O operations. Data is written to the doublewrite buffer in a large sequential chunk, with a single fsync() call to the operating system (except in the case that innodb_flush_method is set to O_DIRECT_NO_FSYNC).

The doublewrite buffer is enabled by default in most cases. To disable the doublewrite buffer, set innodb_doublewrite to 0.

If system tablespace files (ibdata files) are located on Fusion-io devices that support atomic writes, doublewrite buffering is automatically disabled and Fusion-io atomic writes are used for all data files. Because the doublewrite buffer setting is global, doublewrite buffering is also disabled for data files residing on non-Fusion-io hardware. This feature is only supported on Fusion-io hardware and is only enabled for Fusion-io NVMFS on Linux. To take full advantage of this feature, an innodb_flush_method setting of O_DIRECT is recommended.

14.6.5 双写缓冲区(Doublewrite Buffer)

双写缓冲区是一个存储区域，InnoDB 在将 buffer pool 中被修改的页面写入到具体数据文件之前, 先将其刷到双写缓冲区。如果在页面写出过程中, 发生操作系统，存储子系统，或者 mysqld 进程意外崩溃, InnoDB 可以在故障恢复期间, 从双写缓冲区中找到一个正确的页面副本。

虽然数据写了两次磁盘，但使用双写缓冲区并不需要两倍的I/O开销或者I/O操作。以顺序的大数据块方式刷进双写缓冲区，并进行一次 fsync() 系统调用(例外情况是 innodb_flush_method 设置为 O_DIRECT_NO_FSYNC 的情况)。

在大多数情况下，默认都会启用双写缓冲区。要禁用双写缓冲区，请将 innodb_doublewrite 设置为 0。

如果系统表空间文件(ibdata 文件)保存在支持原子写入的 Fusion-io 设备上，则双写缓冲将自动禁用，并且 Fusion-io 原子写入用于所有数据文件。因为双写缓冲区设置是全局的，所以对于驻留在非 Fusion-io 硬件上的数据文件也会禁用双写缓冲区。此功能仅在 Fusion-io 硬件上受支持，并且仅在 Linux 上为 Fusion-io NVMFS 启用。要充分利用此功能，建议将 innodb_flush_method 设置为 O_DIRECT。

14.6.6 Redo Log

The redo log is a disk-based data structure used during crash recovery to correct data written by incomplete transactions. During normal operations, the redo log encodes requests to change table data that result from SQL statements or low-level API calls. Modifications that did not finish updating the data files before an unexpected shutdown are replayed automatically during initialization, and before the connections are accepted. For information about the role of the redo log in crash recovery, see Section 14.19.2, InnoDB Recovery.

By default, the redo log is physically represented on disk by two files named ib_logfile0 and ib_logfile1. MySQL writes to the redo log files in a circular fashion. Data in the redo log is encoded in terms of records affected; this data is collectively referred to as redo. The passage of data through the redo log is represented by an ever-increasing LSN value.

For related information, see Redo Log File Configuration, and Section 8.5.4, Optimizing InnoDB Redo Logging.

Changing the Number or Size of InnoDB Redo Log Files

To change the number or the size of your InnoDB redo log files, perform the following steps:

Stop the MySQL server and make sure that it shuts down without errors.
Edit my.cnf to change the log file configuration. To change the log file size, configure innodb_log_file_size. To increase the number of log files, configure innodb_log_files_in_group.
Start the MySQL server again.

If InnoDB detects that the innodb_log_file_size differs from the redo log file size, it writes a log checkpoint, closes and removes the old log files, creates new log files at the requested size, and opens the new log files.

Group Commit for Redo Log Flushing

InnoDB, like any other ACID-compliant database engine, flushes the redo log of a transaction before it is committed. InnoDB uses group commit functionality to group multiple such flush requests together to avoid one flush for each commit. With group commit, InnoDB issues a single write to the log file to perform the commit action for multiple user transactions that commit at about the same time, significantly improving throughput.

For more information about performance of COMMIT and other transactional operations, see Section 8.5.2, Optimizing InnoDB Transaction Management.

14.6.7 Undo Logs

An undo log is a collection of undo log records associated with a single read-write transaction. An undo log record contains information about how to undo the latest change by a transaction to a clustered index record. If another transaction needs to see the original data as part of a consistent read operation, the unmodified data is retrieved from undo log records. Undo logs exist within undo log segments, which are contained within rollback segments. Rollback segments reside in the system tablespace, in undo tablespaces, and in the temporary tablespace.

Undo logs that reside in the temporary tablespace are used for transactions that modify data in user-defined temporary tables. These undo logs are not redo-logged, as they are not required for crash recovery. They are used only for rollback while the server is running. This type of undo log benefits performance by avoiding redo logging I/O.

14.6.7 撤消日志(Undo Logs)

一个撤消日志(undo log), 就是与一个读写事务相关联的一系列的撤消日志记录(undo log records)。一条撤消日志记录, 就对应了一条 clustered index记录, 其中的信息是如何将这行数据恢复到事务对其修改之前的状态。如果有其他事务进行一致性读, 需要获取某行的原始数据, 那么就可以从撤消日志记录中获取未修改前的数据。撤消日志保存在撤消日志段(undo log segments) 中，撤消日志段又位于回滚段(rollback segments)中。回滚段则驻留在:

系统表空间(system tablespace);
撤消表空间(undo tablespaces);
临时表空间(temporary tablespace)。

驻留在临时表空间中的撤消日志, 用于修改用户自定义临时表的事务。这些撤消日志并不是重做日志，因为在故障恢复时并不是必需的。它们仅在服务器运行时用于回滚。这种类型的撤消日志, 因为不需要重做日志I/O, 所以性能更好。

InnoDB supports a maximum of 128 rollback segments, 32 of which are allocated to the temporary tablespace. This leaves 96 rollback segments that can be assigned to transactions that modify data in regular tables. The innodb_rollback_segments variable defines the number of rollback segments used by InnoDB.

The number of transactions that a rollback segment supports depends on the number of undo slots in the rollback segment and the number of undo logs required by each transaction.

The number of undo slots in a rollback segment differs according to InnoDB page size.

InnoDB 最多支持 128 个回滚段，其中 32 个分配给临时表空间。这就还剩下 96 个回滚段，可以分配给常规表中修改数据的事务。 innodb_rollback_segments 变量定义了InnoDB 使用的回滚段数目。

每个回滚段支持的事务数, 取决于回滚段中的撤消槽位数, 以及每个事务所需的撤消日志数。

回滚段中的撤消槽位数, 根据InnoDB的页面大小而有所不同。

InnoDB Page Size	每个回滚段(Rollback Segment)中的Undo槽位(Slots)数; (InnoDB Page Size / 16)
`4096 (4KB)`	`256`
`8192 (8KB)`	`512`
`16384 (16KB)`	`1024`
`32768 (32KB)`	`2048`
`65536 (64KB)`	`4096`

A transaction is assigned up to four undo logs, one for each of the following operation types:

INSERT operations on user-defined tables
UPDATE and DELETE operations on user-defined tables
INSERT operations on user-defined temporary tables
UPDATE and DELETE operations on user-defined temporary tables

一个事务最多分配四个撤消日志, 每个撤销日志可用于以下操作类型中的一种：

常规表上的 INSERT 操作;
常规表上的 UPDATE 和 DELETE操作;
用户自定义临时表上的 INSERT 操作;
用户自定义临时表上的 UPDATE 和 DELETE 操作.

Undo logs are assigned as needed. For example, a transaction that performs INSERT, UPDATE, and DELETE operations on regular and temporary tables requires a full assignment of four undo logs. A transaction that performs only INSERT operations on regular tables requires a single undo log.

A transaction that performs operations on regular tables is assigned undo logs from an assigned system tablespace or undo tablespace rollback segment. A transaction that performs operations on temporary tables is assigned undo logs from an assigned temporary tablespace rollback segment.

An undo log assigned to a transaction remains tied to the transaction for its duration. For example, an undo log assigned to a transaction for an INSERT operation on a regular table is used for all INSERT operations on regular tables performed by that transaction.

Given the factors described above, the following formulas can be used to estimate the number of concurrent read-write transactions that InnoDB is capable of supporting.

撤消日志根据需要分配。例如，对常规表和临时表执行 INSERT, UPDATE, DELETE 操作的事务, 需要分配完整的四种撤消日志。仅对常规表执行INSERT操作的事务, 只需要单个撤消日志。

对常规表执行操作的事务, 会被分配来自系统表空间或撤消表空间回滚段的撤消日志。对临时表执行操作的事务, 从指定的临时表空间回滚段分配撤消日志。

分配给事务的撤消日志, 在其持续时间内仍然与事务相关联。例如，为常规表上的INSERT操作分配给事务的撤消日志, 用于该事务执行的常规表上的所有 INSERT 操作。

考虑到上述因素，以下公式可用于估计 InnoDB 能够支持的并发读写事务数。

Note

A transaction can encounter a concurrent transaction limit error before reaching the number of concurrent read-write transactions that InnoDB is capable of supporting. This occurs when the rollback segment assigned to a transaction runs out of undo slots. In such cases, try rerunning the transaction.

When transactions perform operations on temporary tables, the number of concurrent read-write transactions that InnoDB is capable of supporting is constrained by the number of rollback segments allocated to the temporary tablespace, which is 32.

If each transaction performs either an INSERT or an UPDATE or DELETE operation, the number of concurrent read-write transactions that InnoDB is capable of supporting is:

提示

如果达到 InnoDB 支持的并发读写事务数上限, 事务可能会遇到并发事务数限制的错误(concurrent transaction limit error)。当分配给事务的回滚段撤消槽用完时，就会发生这种情况。这时候需要尝试重新运行事务。

当事务对临时表执行操作时，InnoDB 能够支持的并发读写事务数, 受分配给临时表空间的回滚段数量(32)限制。

如果每个事务只执行 INSERT操作, 或者只执行 UPDATE/DELETE 操作，则 InnoDB 能够支持的并发读写事务数为：

(innodb_page_size / 16) * (innodb_rollback_segments - 32)

If each transaction performs an INSERT and an UPDATE or DELETE operation, the number of concurrent read-write transactions that InnoDB is capable of supporting is:
如果每个事务需要执行一个INSERT操作加上一个UPDATE/DELETE操作，InnoDB 能够支持的并发读写事务数为：

(innodb_page_size / 16 / 2) * (innodb_rollback_segments - 32)

If each transaction performs an INSERT operation on a temporary table, the number of concurrent read-write transactions that InnoDB is capable of supporting is:
如果每个事务只对临时表执行一个INSERT操作，InnoDB 能够支持的并发读写事务数为：

(innodb_page_size / 16) * 32

If each transaction performs an INSERT and an UPDATE or DELETE operation on a temporary table, the number of concurrent read-write transactions that InnoDB is capable of supporting is:
如果每个事务, 对临时表执行一个INSERT操作, 加上一个UPDATE/DELETE操作，InnoDB 能够支持的并发读写事务数为：

(innodb_page_size / 16 / 2) * 32

参考链接

官方英文版: 14.6 InnoDB On-Disk Structures

Files

14.6_innodb-on-disk-structures.md

Latest commit

History

14.6_innodb-on-disk-structures.md

File metadata and controls