Skip to content

[SPARK-19140][SS]Allow update mode for non-aggregation streaming queries #16520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

[SPARK-19140][SS]Allow update mode for non-aggregation streaming queries #16520

wants to merge 4 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Jan 9, 2017

What changes were proposed in this pull request?

This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.

How was this patch tested?

Jenkins

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71096 has finished for PR 16520 at commit 7d62de4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor

brkyvz commented Jan 10, 2017

LGTM! ultra nit: it will be equivalent to Append mode for some reason sounds better to me when reading it than it will be same as the Append mode.

@@ -58,7 +62,9 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
* the sink
* - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink
* every time these is some updates
*
* - `update`: only the rows that were updated in the streaming DataFrame/Dataset will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also please update pyspark docs?

@brkyvz
Copy link
Contributor

brkyvz commented Jan 10, 2017

Also could you please update pyspark docs?

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71157 has finished for PR 16520 at commit 9f2d877.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -665,6 +665,9 @@ def outputMode(self, outputMode):
the sink
* `complete`:All the rows in the streaming DataFrame/Dataset will be written to the sink
every time these is some updates
* `update`:only the rows that were updated in the streaming DataFrame/Dataset will be
written to the sink every time there are some updates. If the query doesn't contain
aggregations, it will be equivalent to the `append` mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please remove the before append.

every time these is some updates
* `update`:only the rows that were updated in the streaming DataFrame/Dataset will be
written to the sink every time there are some updates. If the query doesn't contain
aggregations, it will be equivalent to the `append` mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -57,7 +57,8 @@ public static OutputMode Complete() {

/**
* OutputMode in which only the rows that were updated in the streaming DataFrame/Dataset will
* be written to the sink every time there are some updates.
* be written to the sink every time there are some updates. If the query doesn't contain
* aggregations, it will be equivalent to the `Append` mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* written to the sink every time these is some updates. This output mode can only be used in
* queries that contain aggregations.
* written to the sink every time these is some updates. If the query doesn't contain
* aggregations, it will be equivalent to the `Append` mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@brkyvz
Copy link
Contributor

brkyvz commented Jan 10, 2017

Left a nit (occurring multiple times), otherwise LGTM!

@brkyvz
Copy link
Contributor

brkyvz commented Jan 10, 2017

thanks LGTM!

@SparkQA
Copy link

SparkQA commented Jan 10, 2017

Test build #71159 has finished for PR 16520 at commit e4f2403.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 11, 2017

Test build #71168 has finished for PR 16520 at commit 889315d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 11, 2017

Thanks. Merging to master and 2.1.

asfgit pushed a commit that referenced this pull request Jan 11, 2017
…ries

## What changes were proposed in this pull request?

This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #16520 from zsxwing/update-without-agg.

(cherry picked from commit bc6c56e)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
@asfgit asfgit closed this in bc6c56e Jan 11, 2017
@zsxwing zsxwing deleted the update-without-agg branch January 11, 2017 06:12
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ries

## What changes were proposed in this pull request?

This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#16520 from zsxwing/update-without-agg.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…ries

## What changes were proposed in this pull request?

This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#16520 from zsxwing/update-without-agg.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants