Skip to content

Document Table Constraint Enforcement Behavior in Custom Table Providers Guide #16340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ To get started, see
library-user-guide/catalogs
library-user-guide/adding-udfs
library-user-guide/custom-table-providers
library-user-guide/table-constraints
library-user-guide/extending-operators
library-user-guide/profiling
library-user-guide/query-optimizer
Expand Down
3 changes: 3 additions & 0 deletions docs/source/library-user-guide/custom-table-providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ Like other areas of DataFusion, you extend DataFusion's functionality by impleme

This section describes how to create a [`TableProvider`] and how to configure DataFusion to use it for reading.

For details on how table constraints such as primary keys or unique
constraints are handled, see [Table Constraint Enforcement](table-constraints.md).

## Table Provider and Scan

The [`TableProvider::scan`] method reads data from the table and is likely the most important. It returns an [`ExecutionPlan`] that DataFusion will use to read the actual data during execution of the query. The [`TableProvider::insert_into`] method is used to `INSERT` data into the table.
Expand Down
46 changes: 46 additions & 0 deletions docs/source/library-user-guide/table-constraints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Table Constraint Enforcement

Table providers can describe table constraints using the
[`TableConstraint`] and [`Constraints`] APIs. These constraints include
primary keys, unique keys, foreign keys and check constraints.

DataFusion does **not** currently enforce these constraints at runtime.
They are provided for informational purposes and can be used by custom
`TableProvider` implementations or other parts of the system.

- **Nullability**: The only property enforced by DataFusion is the
nullability of each [`Field`] in a schema. Columns marked as not
nullable should not produce null values during execution. DataFusion
does not check this when data is ingested.
- **Primary and unique keys**: DataFusion does not verify that the data
satisfies primary or unique key constraints. Table providers that
require this behaviour must implement their own checks.
- **Foreign keys and check constraints**: These constraints are parsed
but are not validated or used during query planning.

The optimizer also does not assume that these constraints hold when
rewriting queries. For example, declaring a column as a primary key will
not allow the optimizer to skip a `DISTINCT` aggregation.

[`tableconstraint`]: https://docs.rs/datafusion/latest/datafusion/sql/planner/enum.TableConstraint.html
[`constraints`]: https://docs.rs/datafusion/latest/datafusion/common/functional_dependencies/struct.Constraints.html
[`field`]: https://docs.rs/arrow/latest/arrow/datatype/struct.Field.html