-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Identity Columns APIs in DeltaColumnBuilder #2857
Conversation
I split the table feature part into a separate PR, which will need to be merged first: #2859 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
spark/src/main/scala/org/apache/spark/sql/delta/TableFeature.scala
Outdated
Show resolved
Hide resolved
* | ||
* @param start the start value of the identity column | ||
* @param step the increment step of the identity column | ||
* @since 3.2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, glad to see this PR so quickly. Are you planning to release it in 3.2.0?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a stretch goal at the moment, since the release for Delta 3.2 is quite soon. I should know by end of week whether I can realistically make 3.2. I plan to leave this PR up for at least one week anyways. :)
It's safe to start merging the code even if we can only make 4.0 since the DeltaSQLConf guards creating identity columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this PR is on the critical path for identity columns implementation in Delta 3.2. We need an interface to specify identity columns. The SQL syntax can only be introduced in the next Spark release (Spark 4), which means we can only have it for Delta 4.0 earliest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Just for my knowledge, the required changes in Spark 4 are only to support SQL syntax, meaning Scala could be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's part of what I'm assessing this week in my prototype. To put the UDF for generating identity column values in only the Delta repo requires adding workarounds so that it works on Spark 3.5. Once I have enough confidence about the idea, I'll post a PR.
Everything else doesn't seem to require Spark changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to make Delta 3.2. I've discussed with @tdas offline and we're aiming for the next Delta version, Delta 3.3. The API doc comment in this PR has been adjusted accordingly.
spark/src/main/scala/org/apache/spark/sql/delta/DeltaErrors.scala
Outdated
Show resolved
Hide resolved
} | ||
|
||
def identityColumnDataTypeNotSupported(): Throwable = { | ||
new AnalysisException("Identity column does not support this data type.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to tell the user what is the supported type and the unsupported type :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked and it seems the other existing error message don't provide the list of supported types. It would be more consistent to not report the supported types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK.
spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaSQLConf.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/spark/sql/delta/DDLTestUtils.scala
Outdated
Show resolved
Hide resolved
e1fafec
to
8ab57a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you!
Which Delta project/connector is this regarding?
Description
This PR is part of #1959
generateAlwaysAsIdentity
andgeneratedByDefaultAsIdentity
APIs into DeltaColumnBuilder so that users can create Delta table with Identity column.How was this patch tested?
New tests.
Does this PR introduce any user-facing changes?
Yes, we introduce
generateAlwaysAsIdentity
andgeneratedByDefaultAsIdentity
interfaces to DeltaColumnBuilder for creating identity columns.Interfaces
When the
start
and thestep
parameters are not specified, they default to1L
.generatedByDefaultAsIdentity
allows users to insert values into the column while a column specified withgeneratedAlwaysAsIdentity
can only ever have system generated values.Example Usage