-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28612][SQL] Add DataFrameWriterV2 API #25681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f7ecd17 to
8d5b7fe
Compare
|
Test build #110143 has finished for PR 25681 at commit
|
|
Test build #110144 has finished for PR 25681 at commit
|
8d5b7fe to
b36c156
Compare
|
Test build #110147 has finished for PR 25681 at commit
|
b36c156 to
9472474
Compare
|
Test build #110262 has finished for PR 25681 at commit
|
|
@brkyvz. this is passing tests. Anything else we need to change? |
| * @group partition_transforms | ||
| * @since 3.0.0 | ||
| */ | ||
| def bucket(numBuckets: Column, e: Column): Column = withExpr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if we're only going to support Literals, I feel it may be better to leave this out initially
| /** | ||
| * Expression for the v2 partition transform years. | ||
| */ | ||
| case class Years(child: Expression) extends PartitionTransformExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to implement ExpectsInputTypes for these classes for auto analysis checks?
|
This LGTM like last time, do we have JIRAs for Python API support as well as Java API tests? |
|
@rdblue Can you please rebase this? |
9472474 to
b45593d
Compare
|
@brkyvz, I rebased, added Java API tests, and opened SPARK-29157 to track the PySpark API. |
|
Test build #110933 has finished for PR 25681 at commit
|
|
Test build #110936 has finished for PR 25681 at commit
|
|
+1. Merging to master! |
|
Thanks, @brkyvz! |
| this | ||
| } | ||
|
|
||
| override def tableProperty(property: String, value: String): DataFrameWriterV2[T] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return CreateTableWriter. It doesn't make sense to specify table properties when inserting to an existing table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened SPARK-29249 for this. Should have a PR posted soon.
What changes were proposed in this pull request?
This adds a new write API as proposed in the SPIP to standardize logical plans. This new API:
append,overwrite,create, andreplacethat correspond to the new logical plans.partitionedBycan only be called when the writer executescreateorreplace.Here are a few example uses of the new API:
How was this patch tested?
Added
DataFrameWriterV2Suitethat tests the new write API. Existing tests for v2 plans.