Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Sep 4, 2019

What changes were proposed in this pull request?

This adds a new write API as proposed in the SPIP to standardize logical plans. This new API:

  • Uses clear verbs to execute writes, like append, overwrite, create, and replace that correspond to the new logical plans.
  • Only creates v2 logical plans so the behavior is always consistent.
  • Does not allow table configuration options for operations that cannot change table configuration. For example, partitionedBy can only be called when the writer executes create or replace.

Here are a few example uses of the new API:

df.writeTo("catalog.db.table").append()
df.writeTo("catalog.db.table").overwrite($"date" === "2019-06-01")
df.writeTo("catalog.db.table").overwritePartitions()
df.writeTo("catalog.db.table").asParquet.create()
df.writeTo("catalog.db.table").partitionedBy(days($"ts")).createOrReplace()
df.writeTo("catalog.db.table").using("abc").replace()

How was this patch tested?

Added DataFrameWriterV2Suite that tests the new write API. Existing tests for v2 plans.

@rdblue rdblue force-pushed the SPARK-28612-add-data-frame-writer-v2 branch from f7ecd17 to 8d5b7fe Compare September 4, 2019 19:44
@SparkQA
Copy link

SparkQA commented Sep 4, 2019

Test build #110143 has finished for PR 25681 at commit f7ecd17.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 4, 2019

Test build #110144 has finished for PR 25681 at commit 8d5b7fe.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue rdblue force-pushed the SPARK-28612-add-data-frame-writer-v2 branch from 8d5b7fe to b36c156 Compare September 4, 2019 21:22
@SparkQA
Copy link

SparkQA commented Sep 5, 2019

Test build #110147 has finished for PR 25681 at commit b36c156.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue rdblue force-pushed the SPARK-28612-add-data-frame-writer-v2 branch from b36c156 to 9472474 Compare September 6, 2019 21:29
@SparkQA
Copy link

SparkQA commented Sep 7, 2019

Test build #110262 has finished for PR 25681 at commit 9472474.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor Author

rdblue commented Sep 11, 2019

@brkyvz. this is passing tests. Anything else we need to change?

* @group partition_transforms
* @since 3.0.0
*/
def bucket(numBuckets: Column, e: Column): Column = withExpr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we're only going to support Literals, I feel it may be better to leave this out initially

/**
* Expression for the v2 partition transform years.
*/
case class Years(child: Expression) extends PartitionTransformExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to implement ExpectsInputTypes for these classes for auto analysis checks?

@brkyvz
Copy link
Contributor

brkyvz commented Sep 11, 2019

This LGTM like last time, do we have JIRAs for Python API support as well as Java API tests?

@brkyvz
Copy link
Contributor

brkyvz commented Sep 17, 2019

@rdblue Can you please rebase this?

@rdblue rdblue force-pushed the SPARK-28612-add-data-frame-writer-v2 branch from 9472474 to b45593d Compare September 18, 2019 20:49
@rdblue
Copy link
Contributor Author

rdblue commented Sep 18, 2019

@brkyvz, I rebased, added Java API tests, and opened SPARK-29157 to track the PySpark API.

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #110933 has finished for PR 25681 at commit b45593d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class PartitionTransformExpression extends Expression with Unevaluable
  • case class Years(child: Expression) extends PartitionTransformExpression
  • case class Months(child: Expression) extends PartitionTransformExpression
  • case class Days(child: Expression) extends PartitionTransformExpression
  • case class Hours(child: Expression) extends PartitionTransformExpression
  • case class Bucket(numBuckets: Literal, child: Expression) extends PartitionTransformExpression
  • implicit class OptionsHelper(options: Map[String, String])
  • trait WriteConfigMethods[R]
  • trait CreateTableWriter[T] extends WriteConfigMethods[CreateTableWriter[T]]

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #110936 has finished for PR 25681 at commit ab7c3e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class JavaDataFrameWriterV2Suite

@brkyvz
Copy link
Contributor

brkyvz commented Sep 19, 2019

+1. Merging to master!

@asfgit asfgit closed this in 2c775f4 Sep 19, 2019
@rdblue
Copy link
Contributor Author

rdblue commented Sep 19, 2019

Thanks, @brkyvz!

this
}

override def tableProperty(property: String, value: String): DataFrameWriterV2[T] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return CreateTableWriter. It doesn't make sense to specify table properties when inserting to an existing table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened SPARK-29249 for this. Should have a PR posted soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants