Skip to content

RFC: Foreign Key Support in Vitess #12967

Open
@GuptaManan100

Description

Introduction

This is an RFC for adding Foreign Key Support in Vitess.

Use Case

  • The use case is to support cross-shard foreign key constraints for a keyspace to establish relations between different tables.
  • Current restrictions require the schema to be structured such that rows for tables linked by foreign keys need to live on the same shard. This constrains schema design and sharding key options.
  • After the suggested changes, we would be able to support adding arbitrary foreign keys irrespective of the vschema / sharding key configuration.

Scope of the project

  • Foreign keys in a single keyspace.
  • Foreign keys will be created in MySQL. Online DDL with vanilla MySQL will not work if foreign keys are created.

Out of Scope

  • Cross-keyspace foreign key support.

Schema

The following is the schema we will use for providing examples as we discuss the design of the foreign key support.

MySQL Schema
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show tables;
+---------------------------+
| Tables_in_foreign_key_rfc |
+---------------------------+
| area                      |
| contact                   |
| customer                  |
| orders                    |
| product                   |
+---------------------------+
5 rows in set (0.00 sec)
mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table area;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                        |
  +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | area  | CREATE TABLE `area` (
                                  `id` int NOT NULL,
                                  `name` varchar(30) DEFAULT NULL,
                                  `zipcode` int DEFAULT NULL,
                                  PRIMARY KEY (`id`)
            ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table contact;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                                                                                                                                                                                                  |
  +---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | contact | CREATE TABLE `contact` (
                                       `id` int NOT NULL,
                                       `contactnum` varchar(10) DEFAULT NULL,
                                       `customer_id` int DEFAULT NULL,
                                       PRIMARY KEY (`id`),
                                       KEY `customer_id` (`customer_id`),
                                       CONSTRAINT `contact_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
              ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table customer;
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                          |
  +----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | customer | CREATE TABLE `customer` (
                                         `id` int NOT NULL,
                                         `name` varchar(30) DEFAULT NULL,
                                         `area_id` int DEFAULT NULL,
                                         PRIMARY KEY (`id`),
                                         KEY `area_id` (`area_id`),
                                         CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`area_id`) REFERENCES `area` (`id`)
               ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table orders;
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                              |
  +--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | orders | CREATE TABLE `orders` (
                                     `id` int DEFAULT NULL,
                                     `product_id` int DEFAULT NULL,
                                     `customer_id` int DEFAULT NULL,
                                     KEY `product_id` (`product_id`),
                                     KEY `customer_id` (`customer_id`),
                                     CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`product_id`) REFERENCES `product` (`id`),
                                     CONSTRAINT `orders_ibfk_2` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`)
             ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql [localhost:8032] {msandbox} (foreign_key_rfc) > show create table product;
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table                                                                                                                                                             |
  +---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | product | CREATE TABLE `product` (
                                       `id` int NOT NULL,
                                       `name` varchar(30) DEFAULT NULL,
                                       PRIMARY KEY (`id`)
              ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
erDiagram
    Area {
        id int PK
        name varchar
        zipcode int
    }

    Product {
        id int PK
        name varchar
    }

    Order {
        id int PK
        produce_id int FK
        customer_id int FK
    }

    Customer {
        id int PK
        name varchar
        area_id int FK
    }

    Contact {
        id int PK
        contact varchar
        customer_id int FK
    }

    Product ||--}o Order :""
    Customer ||--}o Order :""
    Customer ||--}o Contact :""
    Area ||--}o Customer :""
Loading

The data we insert for the examples that follow are -

delete from area;
delete from product;
delete from customer;
delete from orders;
delete from contact;
insert into product values ('1', 'Fan'), ('2', 'Cooler'), ('3', 'AC');
insert into area values(1, 'US', '521763'), (2, 'India', '432143');
insert into customer values(1, 'Manan', 2), (2, 'Andres', 1), (3, 'Harshit', 2);
insert into contact values (1, '897876876', 3), (2, '234123453', 1), (3, '789975234', 2), (4, '343214314', 1);
insert into orders values (1, 1, 2), (2, 1, 1), (3, 1, 1), (4, 2, 1), (4, 2, 3);

Design

This section dives into more specific details of what we intend to do to support foreign keys.

Basic Design

  • Foreign keys will be passed down to MySQL and created there. This will give us inherent support for information schema queries since foreign key definitions will exist on MySQL level.
  • We will set FOREIGN_KEY_CHECKS=0 on all vttablet connections so that DML queries don't fail because of these constraints on MySQL. The constraint verification will be done in vtgates.
  • Vitess will be aware of the foreign key constraints and enforce them on all DMLs.
    • We’ll change Schema Tracking to start using GetSchema RPC call instead of execute Select statements directly. Schema tracking will then be used for getting the create definition calls from GetSchema.
    • On vtgate we’ll create a graph of foreign key constraints (table-relationship graph). We’ll also need to store default values for all columns. (Required for SET DEFAULT).
    • Description of the data structure to follow.
    • We’ll also need to store the uniqueness constraints for the table. (Not required immediately, but will be needed for ON DUPLICATE KEY UPDATE support).
    • Note - There is a delay between the DDL operations executing and between schema tracking updating vtgate. Needs discussion.
  • foreign_key_mode is a flag that already exists in VTGate and it controls whether VTGates allow passing the foreign key constraints to MySQL or erroring out. We’ll deprecate that flag and put ForeignKeyMode as a VSchema configuration.

Planning INSERTs

  • While planning inserts, we’ll consult the table-relationship graph and get the foreign key constraints where the column is a child and plan the SELECT validation queries to make sure the corresponding parent row exists. We’ll get a shared lock on the SELECT.
  • If we’re in AUTOCOMMIT mode, we still have to start a transaction.
  • Overall steps look like -
    • Step 1: If there is no transaction, we’ll start one.
    • Step 2: Execute all the validation queries parallel and get a lock in share mode
    • Step 3: If any of those validation queries return empty results, we fail and rollback (only if the transaction was started implicitly).
    • Step 4: Execute the Insert and vindex updates.
    • Step 5: commit the transaction (if we started one in step 1).
  • ON DUPLICATE KEY UPDATE
    • Plan the update and insert. Runtime defines which one you execute.
    • Run a SELECT statement to know which side to run.

Example

For example, if the user was to execute insert into orders (id, product_id, customer_id) values (4, :a, :b);, the set of steps that Vitess would take -

  1. Start a transaction. START TRANSACTION
  2. Validation queries - SELECT 1 FROM product WHERE ID = :a FOR SHARE, SELECT 1 FROM customer WHERE ID = :b FOR SHARE
  3. If both the queries return a row, then we execute the insert.
  4. COMMIT

Planning UPDATEs and DELETEs

  • Planning of UPDATE and DELETE depends on the referential actions that they are configured with. Let's dive into each one that MySQL allows

RESTRICT/NO ACTION (default)

  • In this configuration, MySQL rejects any update or delete to a row which has a column that is a parent in a foreign key relation.
  • Planning for this, is very similar to INSERTS, in the part that before we run the update or delete we need to verify if there exists a foreign key constraint that references the row.
  • There are subtle differences between the two though, the first being that here we care for foreign key constraints where the table being updated is the parent in the relation, the second is that here we fail if we find a single row matching our SELECT validation query, the third and the most interesting difference is that while planning INSERTs we have the full list of column values being inserted at plan time, but for updates and deletes, we'll only know the column values once we run the query!
  • Overall, the steps look like the following -
    • Step 1: If there is no transaction, we’ll start one.
    • Step 2: Convert the UPDATE/DELETE to a SELECT that returns the rows that are being updated/deleted.
    • Step 3: Run the SELECT query and use these results to generate the validation queries.
    • Step 4: Execute all the validation queries parallel and get a lock in share mode
    • Step 5: If any of those validation queries return any results, we fail and roll back (only if the transaction was started implicitly).
    • Step 6: Execute the Update/Delete and vindex updates.
    • Step 7: commit the transaction (if we started one in step 1).
  • Possible abuse from the user.
    • If the user runs a query like DELETE FROM customer which tries to bulk delete a lot of rows, then vtgate will end up reading a huge list of rows and the SELECT validation query it executes might be extremely large too. This could lead to OOMs. We can add a LIMIT clause to the equivalent SELECT statement of the DELETE and reject such mass updates/deletes if we get more results than the LIMIT allows.

Example

For example, if the user was to execute DELETE FROM customer WHERE area_id = 2;, then Vitess would need to take the following steps -

  1. Start a transaction. START TRANSACTION
  2. Convert the DELETE query into a SELECT with the same WHERE clause. So Vitess would execute SELECT id FROM customer WHERE area_id = 2;. We would get back the following result -
+----+
| id |
+----+
|  1 |
|  3 |
+----+
2 rows in set (0.00 sec)
  1. These results will drive the next validation queries we run. Since the customer table has 2 foreign key constraints where it is the parent, we'll need a validation query for both of them - SELECT 1 FROM contact WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE and SELECT 1 FROM orders WHERE (customer_id) in ((1), (3)) Limit 1 FOR SHARE.
  2. If any of the validation queries return any rows, we fail and rollback.
  3. Execute the delete and vindex updates (if required).
  4. COMMIT.

CASCADE

  • In this configuration, MySQL cascades any updates/deletes to the children in the foreign key constraints. So a deletion of the parent will trigger the deletion of the child rows that are referenced by it.
  • Planning for CASCADEs is a little complex since the rows that we need to cascade the delete too will only be available on run time. Those rows themselves could have foreign key constraints that could fail the operation, in which case, the entire transaction has to be rolled back. In the RESTRICT case, we didn't do any writes until we knew it was going to succeed, so we never had to rollback writes. In this case however, we might need to rollback writes if a cascaded delete fails down the line (because that could have a RESTRICT constraint on it).
  • Overall, the steps look like the following -
    • Step 1: If there is no transaction, we’ll start one.
    • Step 2: Convert the UPDATE/DELETE to a SELECT that returns the rows that are being updated/deleted.
    • Step 3: Run the SELECT query and use these results to find the rows that need to have DELETE/UPDATEs cascaded to.
    • Step 4: Execute the original Update/Delete and vindex updates.
    • Step 5: Repeat the UPDATE/DELETE for the children rows in the same transaction. Do this until no further cascades are required.
    • Step 5: If any of them fail, we'll need to ROLLBACK.
    • Step 6: commit the transaction (if we started one in step 1).
  • Possible abuse from the user.
    • Cascading deletes can end up deleting a lot of rows, especially since deleting a child row could trigger a cascade from a foreign key constraint for a different column other than the child column in the original foreign key constraint. We could add the LIMIT clause to the SELECTs, but in this case it won't be enough, since each row deletion would lead to another SELECT query. We would have to impose an overall limit on the vtgate to prevent OOMs.

SET NULL

  • The planning for SET NULL is very similar to CASCADE. After finding the children rows of the foreign key constraint, we would need to SET the children column to NULLs, so DELETE queries on the parents would trigger an UPDATE on the children rows.

SET DEFAULT

  • Not supported in InnoDb. We still need to decide if we need to support this in Vitess.
  • If we decide to support it, then it is very similar to SET NULL. Only difference being that instead of setting NULL, we'll set the default value after finding it from our schema tracking data.

Planning REPLACE

  • Currently REPLACE statements are only supported in unsharded mode.
  • To support REPLACE we'll plan the DELETE and INSERT and execute a SELECT query to decide if a DELETE is necessary.

Important Considerations

  • Checking for constraints will lead to cross-shard transactions. If the INSERT/UPDATE/DELETE only touch one row (including CASCADEs), then the cross-shard transaction will only be writing in one shard. All the queries executed in other shards will only be SELECT... FOR SHARE statements. So, in case of point updates, we don't have any risk any partial commits/inconsistent state. The write being successful will just be contingent on the COMMIT succeeding in the shard having the write. For DMLs that touch more than 1 row, this guarantee can't be provided and the cross-shard transaction will be best effort. It can leave the database in an inconsistent state in case of partial failure during commit phase.
  • Users should still try to design schema such that parent and child row lives in the same shard as it will be more optimized for enforcing FK constraints.
  • Cyclic foreign key constraints: The design we described above will work fine even when two tables have foreign key references to each other as long as it is not cyclic on the column level. Having a cycle of foreign key constraints where a single column is both a parent and a child in the cycle, then that would mean that no data can be inserted into that column ever! Since to insert data into that column, the data should already exist in that column! We don't plan to worry about this initially, we'll see if we need to add support for this eventually. (It can be done on MySQL. Inserting data has to be done by setting FOREIGN_KEY_CHECKS to 0).
flowchart TD
    subgraph This is acceptable
    Table1
    col1[[col1]]
    col2[[col2]]
    Table1 --- col1
    Table1 --- col2

    Table2
    col3[[col3]]
    col4[[col4]]
    Table2 --- col3
    Table2 --- col4

    col1 -.-> col3
    col4 -.-> col2
    end

    subgraph This won't work
    Table3
    col5[[col1]]
    col6[[col2]]
    Table3 --- col5
    Table3 --- col6

    Table4
    col8[[col3]]
    col7[[col4]]
    Table4 --- col8
    Table4 --- col7

    col5 -.-> col8
    col8 -.-> col5
    end
Loading

Data structure to store FK constraints in VSchema

Schema tracking will give us a list of foreign key constraints as part of the SHOW CREATE TABLE output. We want to store this
output in the VSchema struct in a form that gives us the best performance while planning.

We'll need to answer queries of the following sorts -

  1. For a given table, find all the foreign key constraints where it is a child. (Needed for planning INSERTs)
  2. For a given table, find all the foreign key constraints where it is a parent. (Needed for planning DELETE's and UPDATE`s)

The VSchema struct stores a map of KeyspaceSchema for each keyspace. Within a KeyspaceSchema we have a map of Table.
We'll store the foreign key constraints inside this Table struct.

We'll add 2 more fields to the Table struct -

type Table struct {
	...
	ParentFKs []*ForeignKeyConstraint
	ChildFKs []*ForeingKeyConstraint
}

Essentially, we'll store the list of foreign key constraints where the table is a parent and a list where it is a child.

The ForeignKeyConstraint struct would look something like this -

type ForeignKeyConstraint struct {
    ParentTable TableName
    ParentColumns []Columns
    ChildTable TableName
    ChildColumns []Columns
    OnDeleteAction int // This will be an enum
    OnUpdateAction int
}

Performance Improvements

  1. We can get MySQL to run the INSERT, UPDATE/DELETE (with Restrict) checks for us by using FOREIGN_KEY_CHECKS=1 on the connection for unsharded and single-sharded cases.
  2. For CASCADE and SET NULL we want vtgate to split them up into separate queries so that we have binlog entries for them and vreplication operations like Reshard and MoveTables work.

Phases

  1. Minimal Viable Product
    1. Support for basic INSERT, UPDATE and DELETE statements for unsharded.
    2. RESTRICT/NO ACTION, CASCADE, SET NULL mode for foreign key constraints will be supported.
    3. Support for ON DUPLICATE KEY UPDATE in INSERTs for unsharded.
    4. Support for REPLACE for unsharded.
  2. Phase two
    1. INSERT/UPDATE/DELETE ... (SELECT) (SELECT subquery in DMLs) for unsharded.
    2. Support for single-shard foreign key constraints.
  3. Phase three
    1. Cross-shard support.

Prerequisites

Tasks

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Type

No type

Projects

  • Status

    In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions