Skip to content

Add support for default column values in engine #25679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Apr 27, 2025

Description

This PR adds support for the DEFAULT clause with literal values when creating a new table or adding a column.
The SHOW CREATE TABLE statement now includes the specified default values.
These defaults are respected in INSERT and MERGE operations.
Non-literal expressions such as <datetime value function>, USER, and similar are not supported in this PR.

<default clause> ::=
  DEFAULT <default option> 

<default option> ::=
    <literal> 
  | <datetime value function> 
  | USER
  | CURRENT_USER
  | CURRENT_ROLE
  | SESSION_USER
  | SYSTEM_USER
  | CURRENT_CATALOG
  | CURRENT_SCHEMA
  | CURRENT_PATH
  | <implicitly typed value specification> 

Release notes

## General, Memory
* Add support for default column values. ({issue}`25679`)

@cla-bot cla-bot bot added the cla-signed label Apr 27, 2025
@github-actions github-actions bot added iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector bigquery BigQuery connector mongodb MongoDB connector cassandra Cassandra connector ignite Ignite connector memory Memory connector labels Apr 27, 2025
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from 796b75e to 9b59d7c Compare May 5, 2025 12:16
@ebyhr ebyhr marked this pull request as ready for review May 5, 2025 12:31
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from 9b59d7c to 30ab353 Compare May 5, 2025 12:51
@github-actions github-actions bot added the docs label May 5, 2025
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch 3 times, most recently from 3eae180 to 2e3b139 Compare May 5, 2025 23:31
@ebyhr ebyhr requested a review from martint May 5, 2025 23:34
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch 2 times, most recently from b905a01 to d75264c Compare May 6, 2025 01:10
@ebyhr ebyhr requested review from Praveen2112 and kasiafi May 12, 2025 03:10
@@ -535,7 +535,7 @@ private RelationPlan getInsertPlan(
if (supportsMissingColumnsOnInsert) {
continue;
}
expression = new Constant(column.getType(), null);
expression = new Constant(column.getType(), column.getDefaultValue().orElse(null));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the connector supportsMissingColumnsOnInsert, then it is supposed to apply the default value on its own, right?
Do we have tests to cover both cases (connector applies the default value versus the engine applies it here)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We don't have such a test. I'm wondering if we can write a helpful test covering both cases in the engine.

@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from d75264c to 995de6e Compare May 12, 2025 23:36
@ebyhr
Copy link
Member Author

ebyhr commented May 12, 2025

Addressed comments.

@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch 3 times, most recently from cecbfec to 62525ca Compare May 14, 2025 04:50
@kasiafi
Copy link
Member

kasiafi commented May 26, 2025

Do we plan to support other variants of the default value in the future? For values like CURRENT_USER or <datetime value function>, we cannot store them statically. I think it's fine to special-case for constants for now, but if we want to support the other options, we will need another api to store expressions.

@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch 2 times, most recently from 4a4d211 to 7550565 Compare May 27, 2025 23:03
@ebyhr
Copy link
Member Author

ebyhr commented May 28, 2025

I'm a little wondering whether fallback to existing implementation is really good idea. The behavior between default-null and missing-columns might be different in remote connectors. For instance, ClickHouse has default values in each type (e.g. 1 in integer). DEFAULT null might be used to enforce null in such connectors.

I changed NullableValue to ConnectorExpression so we can support non-constants in the future.

Copy link
Member

@kasiafi kasiafi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about type coercion.

@martint could you please check if it makes sense to you? Other than that, I think it's ready for you to review.

@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from 7550565 to 9507eb0 Compare May 29, 2025 05:02
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from 9507eb0 to b077120 Compare June 6, 2025 22:00
@ebyhr
Copy link
Member Author

ebyhr commented Jun 6, 2025

Rebased on master (no change).

@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch 3 times, most recently from 9983fbd to fad0d2d Compare June 8, 2025 23:28
@ebyhr ebyhr requested a review from martint June 9, 2025 23:30
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from fad0d2d to 4d85f1f Compare June 11, 2025 00:13
@@ -659,6 +701,84 @@ private Query showCreateTable(ShowCreate node)
return singleValueQuery("Create Table", formatSql(createTable).trim());
}

private static Literal toLiteral(Type type, ConnectorExpression expression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads me to think that the SPI should take and return a SQL expression (in text form, like we do with row filters and column masks) to represent the default values. Let me ponder this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about translating to IR and building literals with Type.getObjectValue? It will simplify this method:

io.trino.sql.ir.Constant constant = (io.trino.sql.ir.Constant) ConnectorExpressionTranslator.translate(session, expression, plannerContext, Map.of());
Object value = constant.value();
if (value == null) {
    return new NullLiteral();
}
...

case TimestampWithTimeZoneType _ -> new GenericLiteral("TIMESTAMP", type.getObjectValue(null, constant.getValueAsBlock(), 0).toString());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed another commit to show the above idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the issue with this approach is that it requires explicit hardcoded handling of all the types. It doesn't work with extended types (e.g., Geometry / Geography, and others)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this more, I've come to the conclusion that the default values should be described as SQL expressions, similar to what we do for column masks, row filters and views. Some reasons for this:

  • ConnectorExpression is an IR concept, and its type system won't necessarily match the SQL type system over time (especially as we keep evolving and unifying the IR). This will make it harder to describe and recover the SQL-level syntax for operationrs such as DESCRIBE.
  • Some of the constructs such as CURRENT_USER, CURRENT_TIMESTAMP, etc. have no correspondence in the IR -- they are desugared during initial planning. It would make it harder to represent them without introducing artificial functions that the connectors need to understand.

Copy link
Member Author

@ebyhr ebyhr Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faced some problems during string migration:

  • CREAT TABLE and ADD COLUMN: How can we convert evaluated objects to string when passing to SPI? One approach seems casting it as varchar/char type and using the string value. I can't find a better way.
  • SHOW CREATE TABLE: We need to convert string to Expression for ColumnDefinition in ShowQueriesRewrite class. Formatting the expression without hard-coded type handling looks still difficult. For instance, I suppose we want to only show true or false without BOOLEAN '...' on boolean type. Converting to string in AstBuilder resolves this, but it looks incorrect approach to me.

Pushed new commits with 2 TODO comments in ColumnDefaultOptions and AstBuilder to showcase the above problems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we convert evaluated objects to string when passing to SPI

There shouldn't be a need to convert evaluated objects. The expressions should be analyzed during analysis/planning of CREATE TABLE and ADD COLUMN, but then the original string should be passed to the connector for storage.

SHOW CREATE TABLE: We need to convert string to Expression for ColumnDefinition

It should do the same thing SHOW CREATE VIEW does -- parse the expression to create the corresponding AST node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanation. Addressed comments.

@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch 3 times, most recently from 00de80b to d086c12 Compare June 13, 2025 23:53
@ebyhr
Copy link
Member Author

ebyhr commented Jun 13, 2025

Rebased on master to resolve conflicts.

@@ -1874,6 +1874,7 @@ public static class MergeAnalysis
private final List<List<ColumnHandle>> mergeCaseColumnHandles;
// Case number map to columns
private final Multimap<Integer, ColumnHandle> updateCaseColumnHandles;
private final Map<ColumnHandle, io.trino.sql.ir.Expression> defaultColumnValues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question not relates to this : I see that the value is assigned to ir.Expression here. Could you clarify the boundary between when we should use ir.Expression vs. Expression? I’m trying to understand the intended separation or convention

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And does it now actually already to be the Constant ?

@@ -132,6 +137,9 @@ public ListenableFuture<Void> execute(
}
return immediateVoidFuture();
}
if (element.getDefaultValue().isPresent() && !plannerContext.getMetadata().getConnectorCapabilities(session, catalogHandle).contains(DEFAULT_COLUMN_VALUE)) {
throw semanticException(NOT_SUPPORTED, element, "Catalog '%s' does not support default value for column name '%s'", catalogHandle, columnName);
}
if (!element.isNullable() && !plannerContext.getMetadata().getConnectorCapabilities(session, catalogHandle).contains(NOT_NULL_COLUMN_CONSTRAINT)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the !element.isNullable(), what if added defaultValue always is NULL?

@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch 2 times, most recently from 453db51 to 5f36963 Compare June 20, 2025 04:38
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from 5f36963 to 1c31f22 Compare June 23, 2025 03:28
@ebyhr ebyhr force-pushed the ebi/engine-default-column-values branch from 1c31f22 to a7ae149 Compare June 23, 2025 03:41
@ebyhr ebyhr requested a review from martint June 23, 2025 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery BigQuery connector cassandra Cassandra connector cla-signed delta-lake Delta Lake connector docs hive Hive connector iceberg Iceberg connector ignite Ignite connector memory Memory connector mongodb MongoDB connector syntax-needs-review
Development

Successfully merging this pull request may close these issues.

6 participants