[SPARK-14287] isStreaming method for Dataset #12080

brkyvz · 2016-03-31T04:09:47Z

With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will not make sense, e.g. Dataset.count().

A simple API is required to check whether the data in a Dataset is bounded or unbounded. This will allow users to check whether their Dataset is in streaming mode or not. ML algorithms may check if the data is unbounded and throw an exception for example.

The implementation of this method is simple, however naming it is the challenge. Some possible names for this method are:

isStreaming
isContinuous
isBounded
isUnbounded

I've gone with isStreaming for now. We can change it before Spark 2.0 if we decide to come up with a different name. For that reason I've marked it as @Experimental

SparkQA · 2016-03-31T04:13:42Z

Test build #54585 has finished for PR 12080 at commit 7459a3c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-03-31T05:42:07Z

Test build #54589 has finished for PR 12080 at commit 7dd88a3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

DeepSparkBot · 2016-04-01T00:07:20Z

LGTM

DeepSparkBot · 2016-04-01T17:51:04Z

@marmbrus @tdas Please advise.

marmbrus · 2016-04-04T18:32:46Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

@@ -449,6 +450,17 @@ class Dataset[T] private[sql](
  def isLocal: Boolean = logicalPlan.isInstanceOf[LocalRelation]

  /**
+   * Returns true if the underlying query will be executed continuously as new data comes in.
+   * Methods that return bounded values, e.g. [[count()]], [[collect()]] will throw
+   * an exception if a Dataset is streaming.


How about?

Returns true if this [[Dataset]] contains one or more sources that continuously return data as it arrives. A [[Dataset]] that reads data from a streaming source must be executed as a [[ContinuousQuery]] using the `startStream()` method in [[DataFrameWriter]]. Methods that return a single answer, (e.g., `count()` or `collect()`) will throw an [[AnalysisException]] when there is a streaming source present.

marmbrus · 2016-04-04T18:33:05Z

Implementation LGTM.

brkyvz · 2016-04-04T18:53:40Z

@marmbrus Addressed your comment

SparkQA · 2016-04-04T18:59:00Z

Test build #54875 has finished for PR 12080 at commit f4302bc.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-04T23:05:22Z

Test build #54896 has finished for PR 12080 at commit f4debd0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-04-04T23:10:51Z

test this please

marmbrus · 2016-04-04T23:11:03Z

@tdas here is another failure

SparkQA · 2016-04-05T00:42:47Z

Test build #54907 has finished for PR 12080 at commit f4debd0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2016-04-05T02:03:35Z

Thanks, merging to master!

added isStreaming method to Dataset

7459a3c

fix ss

7dd88a3

marmbrus reviewed Apr 4, 2016
View reviewed changes

address comments

f4302bc

remove ws

f4debd0

asfgit closed this in ba24d1e Apr 5, 2016

brkyvz deleted the is-streaming branch February 3, 2019 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-14287] isStreaming method for Dataset #12080

[SPARK-14287] isStreaming method for Dataset #12080

Uh oh!

brkyvz commented Mar 31, 2016

Uh oh!

SparkQA commented Mar 31, 2016

Uh oh!

SparkQA commented Mar 31, 2016

Uh oh!

DeepSparkBot commented Apr 1, 2016

Uh oh!

DeepSparkBot commented Apr 1, 2016

Uh oh!

marmbrus Apr 4, 2016

Uh oh!

marmbrus commented Apr 4, 2016

Uh oh!

brkyvz commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

marmbrus commented Apr 4, 2016

Uh oh!

marmbrus commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 5, 2016

Uh oh!

marmbrus commented Apr 5, 2016

Uh oh!

Uh oh!

[SPARK-14287] isStreaming method for Dataset #12080

[SPARK-14287] isStreaming method for Dataset #12080

Uh oh!

Conversation

brkyvz commented Mar 31, 2016

Uh oh!

SparkQA commented Mar 31, 2016

Uh oh!

SparkQA commented Mar 31, 2016

Uh oh!

DeepSparkBot commented Apr 1, 2016

Uh oh!

DeepSparkBot commented Apr 1, 2016

Uh oh!

marmbrus Apr 4, 2016

Choose a reason for hiding this comment

Uh oh!

marmbrus commented Apr 4, 2016

Uh oh!

brkyvz commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 4, 2016

Uh oh!

marmbrus commented Apr 4, 2016

Uh oh!

marmbrus commented Apr 4, 2016

Uh oh!

SparkQA commented Apr 5, 2016

Uh oh!

marmbrus commented Apr 5, 2016

Uh oh!

Uh oh!