[SPARK-55716][SQL] Fix V1 file source NOT NULL constraint enforcement by yaooqinn · Pull Request #54517 · apache/spark

yaooqinn · 2026-02-26T17:15:28Z

What changes were proposed in this pull request?

V1 file-based DataSource writes (parquet/orc/json) silently accept null values into NOT NULL columns. This PR fixes the issue by:

CreateDataSourceTableCommand: Preserve user-specified nullability by recursively merging nullability flags from the user schema into the resolved dataSource.schema (which has CharVarchar normalization, metadata, etc.). Previously it stored dataSource.schema directly, which is all-nullable due to DataSource.resolveRelation() calling dataSchema.asNullable.
PreprocessTableInsertion: Restore nullability flags from the catalog schema before null checks. This ensures AssertNotNull is injected when needed. Gated behind a legacy config flag.
Legacy config: spark.sql.legacy.allowNullInsertForFileSourceTables (default false) for backward compatibility.

Why are the changes needed?

The root cause has two parts:

DataSource.resolveRelation() calls dataSchema.asNullable (added in SPARK-13738 for read safety), stripping all NOT NULL constraints recursively.
CreateDataSourceTableCommand stores this all-nullable schema in the catalog, permanently losing NOT NULL information.
As a result, PreprocessTableInsertion never injects AssertNotNull for V1 file source tables.

Note: InsertableRelation (e.g., SimpleInsertSource) does NOT have this problem because it preserves the original schema (SPARK-24583).

Does this PR introduce any user-facing change?

Yes. V1 file source tables (parquet/orc/json) will now enforce NOT NULL constraints during INSERT operations, matching the behavior of V2 tables. A legacy config is provided for backward compatibility.

How was this patch tested?

Added 7 new tests in InsertSuite:

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

V1 file-based DataSource writes (parquet/orc/json) silently accept null values into NOT NULL columns. The root cause: 1. `DataSource.resolveRelation()` calls `dataSchema.asNullable` (SPARK-13738) for read safety, stripping NOT NULL recursively. 2. `CreateDataSourceTableCommand` stores this all-nullable schema in the catalog, permanently losing NOT NULL info. 3. `PreprocessTableInsertion` never injects `AssertNotNull` because the schema is all-nullable. Fix: - `CreateDataSourceTableCommand`: preserve user-specified nullability via recursive merging into the resolved schema. - `PreprocessTableInsertion`: restore nullability flags from catalog schema before null checks. - Add legacy config `spark.sql.legacy.allowNullInsertForFileSourceTables` (default false) for backward compatibility. Covers top-level and nested types (array elements, struct fields, map values). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

yaooqinn · 2026-02-26T17:32:29Z

cc @dongjoon-hyun @cloud-fan @gengliangwang, Is the fix direction correct? is this a genuine bug or design choice. I haven't found any public discussions on this area.

dongjoon-hyun · 2026-02-26T17:54:25Z

Hi, @yaooqinn

Apache Spark didn't claim to support SQL CONSTRAINT for v1 yet, did it? IIUC, it should be handled explicitly by the user in FROM SELECT clause of that INSERT statement.
SPARK-51207: SPIP: Constraints in DSv2 was a fairly new feature of Apache Spark 4.1.0 only.

For me, this PR seems to introduce a new feature instead of bugs.

cc @aokolnychyi , @peter-toth , too.

yaooqinn force-pushed the SPARK-55716 branch from 30258f5 to 8aeab4f Compare February 26, 2026 17:18

yaooqinn requested review from cloud-fan and dongjoon-hyun February 26, 2026 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55716][SQL] Fix V1 file source NOT NULL constraint enforcement#54517

[SPARK-55716][SQL] Fix V1 file source NOT NULL constraint enforcement#54517
yaooqinn wants to merge 1 commit intoapache:masterfrom
yaooqinn:SPARK-55716

yaooqinn commented Feb 26, 2026 •

edited

Loading

Uh oh!

yaooqinn commented Feb 26, 2026

Uh oh!

dongjoon-hyun commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaooqinn commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

yaooqinn commented Feb 26, 2026

Uh oh!

dongjoon-hyun commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yaooqinn commented Feb 26, 2026 •

edited

Loading

dongjoon-hyun commented Feb 26, 2026 •

edited

Loading