Description
Introduction
This is an RFC for adding Foreign Key Support in Vitess.
Use Case
- The use case is to support cross-shard foreign key constraints for a keyspace to establish relations between different tables.
- Current restrictions require the schema to be structured such that rows for tables linked by foreign keys need to live on the same shard. This constrains schema design and sharding key options.
- After the suggested changes, we would be able to support adding arbitrary foreign keys irrespective of the vschema / sharding key configuration.
Scope of the project
- Foreign keys in a single keyspace.
- Foreign keys will be created in MySQL. Online DDL with vanilla MySQL will not work if foreign keys are created.
Out of Scope
- Cross-keyspace foreign key support.
Schema
The following is the schema we will use for providing examples as we discuss the design of the foreign key support.
MySQL Schema
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show tables;
+---------------------------+
| Tables_in_foreign_key_rfc |
+---------------------------+
| area |
| contact |
| customer |
| orders |
| product |
+---------------------------+
5 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show tables;
+---------------------------+
| Tables_in_foreign_key_rfc |
+---------------------------+
| area |
| contact |
| customer |
| orders |
| product |
+---------------------------+
5 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table area;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| area | CREATE TABLE `area` (
`id` int NOT NULL,
`name` varchar(30) DEFAULT NULL,
`zipcode` int DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table contact;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| contact | CREATE TABLE `contact` (
`id` int NOT NULL,
`contactnum` varchar(10) DEFAULT NULL,
`customer_id` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `customer_id` (`customer_id`),
CONSTRAINT `contact_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table customer;
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| customer | CREATE TABLE `customer` (
`id` int NOT NULL,
`name` varchar(30) DEFAULT NULL,
`area_id` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `area_id` (`area_id`),
CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`area_id`) REFERENCES `area` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table orders;
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| orders | CREATE TABLE `orders` (
`id` int DEFAULT NULL,
`product_id` int DEFAULT NULL,
`customer_id` int DEFAULT NULL,
KEY `product_id` (`product_id`),
KEY `customer_id` (`customer_id`),
CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`),
CONSTRAINT `orders_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table product;
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| product | CREATE TABLE `product` (
`id` int NOT NULL,
`name` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
erDiagram
Area {
id int PK
name varchar
zipcode int
}
Product {
id int PK
name varchar
}
Order {
id int PK
produce_id int FK
customer_id int FK
}
Customer {
id int PK
name varchar
area_id int FK
}
Contact {
id int PK
contact varchar
customer_id int FK
}
Product ||--}o Order :""
Customer ||--}o Order :""
Customer ||--}o Contact :""
Area ||--}o Customer :""
The data we insert for the examples that follow are -
delete from area;
delete from product;
delete from customer;
delete from orders;
delete from contact;
insert into product values ('1', 'Fan'), ('2', 'Cooler'), ('3', 'AC');
insert into area values(1, 'US', '521763'), (2, 'India', '432143');
insert into customer values(1, 'Manan', 2), (2, 'Andres', 1), (3, 'Harshit', 2);
insert into contact values (1, '897876876', 3), (2, '234123453', 1), (3, '789975234', 2), (4, '343214314', 1);
insert into orders values (1, 1, 2), (2, 1, 1), (3, 1, 1), (4, 2, 1), (4, 2, 3);
Design
This section dives into more specific details of what we intend to do to support foreign keys.
Basic Design
- Foreign keys will be passed down to MySQL and created there. This will give us inherent support for information schema queries since foreign key definitions will exist on MySQL level.
- We will set
FOREIGN_KEY_CHECKS=0
on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates. - Vitess will be aware of the foreign key constraints and enforce them on all DMLs.
- We’ll change Schema Tracking to start using GetSchema RPC call instead of execute Select statements directly. Schema tracking will then be used for getting the create definition calls from GetSchema.
- On vtgate we’ll create a graph of foreign key constraints (table-relationship graph). We’ll also need to store default values for all columns. (Required for
SET DEFAULT
). - Description of the data structure to follow.
- We’ll also need to store the uniqueness constraints for the table. (Not required immediately, but will be needed for
ON DUPLICATE KEY UPDATE
support). - Note - There is a delay between the DDL operations executing and between schema tracking updating vtgate. Needs discussion.
foreign_key_mode
is a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and putForeignKeyMode
as a VSchema configuration.
Planning INSERTs
- While planning inserts, we’ll consult the table-relationship graph and get the foreign key constraints where the column is a child and plan the SELECT validation queries to make sure the corresponding parent row exists. We’ll get a shared lock on the SELECT.
- If we’re in AUTOCOMMIT mode, we still have to start a transaction.
- Overall steps look like -
- Step 1: If there is no transaction, we’ll start one.
- Step 2: Execute all the validation queries parallel and get a lock in share mode
- Step 3: If any of those validation queries return empty results, we fail and rollback (only if the transaction was started implicitly).
- Step 4: Execute the Insert and vindex updates.
- Step 5: commit the transaction (if we started one in step 1).
ON DUPLICATE KEY UPDATE
- Plan the update and insert. Runtime defines which one you execute.
- Run a SELECT statement to know which side to run.
Example
For example, if the user was to execute insert into orders (id, product_id, customer_id) values (4, :a, :b);
, the set of steps that Vitess would take -
- Start a transaction.
START TRANSACTION
- Validation queries -
SELECT 1 FROM product WHERE ID = :a FOR SHARE
,SELECT 1 FROM customer WHERE ID = :b FOR SHARE
- If both the queries return a row, then we execute the insert.
COMMIT
Planning UPDATEs
and DELETEs
- Planning of
UPDATE
andDELETE
depends on the referential actions that they are configured with. Let's dive into each one that MySQL allows
RESTRICT
/NO ACTION
(default)
- In this configuration, MySQL rejects any update or delete to a row which has a column that is a parent in a foreign key relation.
- Planning for this, is very similar to INSERTS, in the part that before we run the update or delete we need to verify if there exists a foreign key constraint that references the row.
- There are subtle differences between the two though, the first being that here we care for foreign key constraints where the table being updated is the parent in the relation, the second is that here we fail if we find a single row matching our
SELECT
validation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query! - Overall, the steps look like the following -
- Step 1: If there is no transaction, we’ll start one.
- Step 2: Convert the
UPDATE
/DELETE
to aSELECT
that returns the rows that are being updated/deleted. - Step 3: Run the
SELECT
query and use these results to generate the validation queries. - Step 4: Execute all the validation queries parallel and get a lock in share mode
- Step 5: If any of those validation queries return any results, we fail and roll back (only if the transaction was started implicitly).
- Step 6: Execute the Update/Delete and vindex updates.
- Step 7: commit the transaction (if we started one in step 1).
- Possible abuse from the user.
- If the user runs a query like
DELETE FROM customer
which tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and theSELECT
validation query it executes might be extremely large too. This could lead to OOMs. We can add aLIMIT
clause to the equivalentSELECT
statement of theDELETE
and reject such mass updates/deletes if we get more results than theLIMIT
allows.
- If the user runs a query like
Example
For example, if the user was to execute DELETE FROM customer WHERE area_id = 2;
, then Vitess would need to take the following steps -
- Start a transaction.
START TRANSACTION
- Convert the
DELETE
query into aSELECT
with the sameWHERE
clause. So Vitess would executeSELECT id FROM customer WHERE area_id = 2;
. We would get back the following result -
+----+
| id |
+----+
| 1 |
| 3 |
+----+
2 rows in set (0.00 sec)
- These results will drive the next validation queries we run. Since the
customer
table has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them -SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE
andSELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE
. - If any of the validation queries return any rows, we fail and rollback.
- Execute the delete and vindex updates (if required).
COMMIT
.
CASCADE
- In this configuration, MySQL cascades any updates/deletes to the children in the foreign key constraints. So a deletion of the parent will trigger the deletion of the child rows that are referenced by it.
- Planning for CASCADEs is a little complex since the rows that we need to cascade the delete too will only be available on run time. Those rows themselves could have foreign key constraints that could fail the operation, in which case, the entire transaction has to be rolled back. In the
RESTRICT
case, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it). - Overall, the steps look like the following -
- Step 1: If there is no transaction, we’ll start one.
- Step 2: Convert the
UPDATE
/DELETE
to aSELECT
that returns the rows that are being updated/deleted. - Step 3: Run the
SELECT
query and use these results to find the rows that need to have DELETE/UPDATEs cascaded to. - Step 4: Execute the original Update/Delete and vindex updates.
- Step 5: Repeat the
UPDATE
/DELETE
for the children rows in the same transaction. Do this until no further cascades are required. - Step 5: If any of them fail, we'll need to
ROLLBACK
. - Step 6: commit the transaction (if we started one in step 1).
- Possible abuse from the user.
- Cascading deletes can end up deleting a lot of rows, especially since deleting a child row could trigger a cascade from a foreign key constraint for a different column other than the child column in the original foreign key constraint. We could add the
LIMIT
clause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.
- Cascading deletes can end up deleting a lot of rows, especially since deleting a child row could trigger a cascade from a foreign key constraint for a different column other than the child column in the original foreign key constraint. We could add the
SET NULL
- The planning for
SET NULL
is very similar toCASCADE
. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.
SET DEFAULT
- Not supported in InnoDb. We still need to decide if we need to support this in Vitess.
- If we decide to support it, then it is very similar to
SET NULL
. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.
Planning REPLACE
- Currently
REPLACE
statements are only supported in unsharded mode. - To support
REPLACE
we'll plan the DELETE and INSERT and execute aSELECT
query to decide if aDELETE
is necessary.
Important Considerations
- Checking for constraints will lead to cross-shard transactions. If the
INSERT
/UPDATE
/DELETE
only touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only beSELECT... FOR SHARE
statements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on theCOMMIT
succeeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase. - Users should still try to design schema such that parent and child row lives in the same shard as it will be more optimized for enforcing FK constraints.
- Cyclic foreign key constraints: The design we described above will work fine even when two tables have foreign key references to each other as long as it is not cyclic on the column level. Having a cycle of foreign key constraints where a single column is both a parent and a child in the cycle, then that would mean that no data can be inserted into that column ever! Since to insert data into that column, the data should already exist in that column! We don't plan to worry about this initially, we'll see if we need to add support for this eventually. (It can be done on MySQL. Inserting data has to be done by setting
FOREIGN_KEY_CHECKS
to 0).
flowchart TD
subgraph This is acceptable
Table1
col1[[col1]]
col2[[col2]]
Table1 --- col1
Table1 --- col2
Table2
col3[[col3]]
col4[[col4]]
Table2 --- col3
Table2 --- col4
col1 -.-> col3
col4 -.-> col2
end
subgraph This won't work
Table3
col5[[col1]]
col6[[col2]]
Table3 --- col5
Table3 --- col6
Table4
col8[[col3]]
col7[[col4]]
Table4 --- col8
Table4 --- col7
col5 -.-> col8
col8 -.-> col5
end
Data structure to store FK constraints in VSchema
Schema tracking will give us a list of foreign key constraints as part of the SHOW CREATE TABLE
output. We want to store this
output in the VSchema
struct in a form that gives us the best performance while planning.
We'll need to answer queries of the following sorts -
- For a given table, find all the foreign key constraints where it is a child. (Needed for planning
INSERT
s) - For a given table, find all the foreign key constraints where it is a parent. (Needed for planning
DELETE's and
UPDATE`s)
The VSchema
struct stores a map of KeyspaceSchema
for each keyspace. Within a KeyspaceSchema
we have a map of Table
.
We'll store the foreign key constraints inside this Table
struct.
We'll add 2 more fields to the Table
struct -
type Table struct {
...
ParentFKs []*ForeignKeyConstraint
ChildFKs []*ForeingKeyConstraint
}
Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.
The ForeignKeyConstraint
struct would look something like this -
type ForeignKeyConstraint struct {
ParentTable TableName
ParentColumns []Columns
ChildTable TableName
ChildColumns []Columns
OnDeleteAction int // This will be an enum
OnUpdateAction int
}
Performance Improvements
- We can get MySQL to run the
INSERT
,UPDATE/DELETE
(with Restrict) checks for us by usingFOREIGN_KEY_CHECKS=1
on the connection for unsharded and single-sharded cases. - For CASCADE and SET NULL we want vtgate to split them up into separate queries so that we have binlog entries for them and vreplication operations like Reshard and MoveTables work.
Phases
- Minimal Viable Product
- Support for basic
INSERT
,UPDATE
andDELETE
statements for unsharded. RESTRICT
/NO ACTION
,CASCADE
,SET NULL
mode for foreign key constraints will be supported.- Support for
ON DUPLICATE KEY UPDATE
in INSERTs for unsharded. - Support for
REPLACE
for unsharded.
- Support for basic
- Phase two
INSERT/UPDATE/DELETE ... (SELECT)
(SELECT subquery in DMLs) for unsharded.- Support for single-shard foreign key constraints.
- Phase three
- Cross-shard support.
Prerequisites
-
INSERT
planning in Gen4. Gen4: move insert planner to gen4 #12934 - Table Schema to be provided through GetSchema vttablet RPC - Augmenting the
GetSchema
RPC to also work forTable
andAll
type of input #13197
Tasks
- Adding a keyspace configuration in the vschema for foreign keys and deprecate the flag on vtgate Add a keyspace configuration in the vschema for foreign keys #13552
- [ ] On the vttablet, we need FOREIGN_KEY_CHECKS=0. - Augment schema-tracking to get foreign keys information and build the data structure in vtgate (in VSchema) vtgate table schema tracking to use GetSchema rpc #13544 Build foreign key definition in schema tracker #13657
- Foreign key INSERT planning (shard scoped fk constraint). Foreign Keys:
INSERT
planning #13676 - Foreign key UPDATE planning. (shard scoped RESTRICT constraint). Foreign Keys:
UPDATE
planning #13762 - Foreign key DELETE planning. (shard scoped RESTRICT constraint). Foreign Keys:
DELETE
planning #13746 - Add end to end tests for foreign keys with online-ddl, move tables, vreplication and other components of Vitess.
- Benchmarking of writes, for getting the performance impact. Foreign Key Fuzzer Benchmark #14542
- Documentation + Release notes changes
- Foreign key UPDATE planning. (cascade constraint). Foreign key cascade planning for DELETE and UPDATE queries #13823
- Foreign key DELETE planning. (cascade constraint). Foreign key cascade planning for DELETE and UPDATE queries #13823
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
In Progress
Activity