Skip to content

[SPARK-14287] isStreaming method for Dataset #12080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Mar 31, 2016

With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will not make sense, e.g. Dataset.count().

A simple API is required to check whether the data in a Dataset is bounded or unbounded. This will allow users to check whether their Dataset is in streaming mode or not. ML algorithms may check if the data is unbounded and throw an exception for example.

The implementation of this method is simple, however naming it is the challenge. Some possible names for this method are:

  • isStreaming
  • isContinuous
  • isBounded
  • isUnbounded

I've gone with isStreaming for now. We can change it before Spark 2.0 if we decide to come up with a different name. For that reason I've marked it as @Experimental

@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54585 has finished for PR 12080 at commit 7459a3c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54589 has finished for PR 12080 at commit 7dd88a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@DeepSparkBot
Copy link

LGTM

@DeepSparkBot
Copy link

@marmbrus @tdas Please advise.

@@ -449,6 +450,17 @@ class Dataset[T] private[sql](
def isLocal: Boolean = logicalPlan.isInstanceOf[LocalRelation]

/**
* Returns true if the underlying query will be executed continuously as new data comes in.
* Methods that return bounded values, e.g. [[count()]], [[collect()]] will throw
* an exception if a Dataset is streaming.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

Returns true if this [[Dataset]] contains one or more sources that continuously return data as it arrives. 
A [[Dataset]] that reads data from a streaming source must be executed as a [[ContinuousQuery]] using
the `startStream()` method in [[DataFrameWriter]].  Methods that return a single answer, (e.g., `count()` or 
`collect()`) will throw an [[AnalysisException]] when there is a streaming source present.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 4, 2016

Implementation LGTM.

@brkyvz
Copy link
Contributor Author

brkyvz commented Apr 4, 2016

@marmbrus Addressed your comment

@SparkQA
Copy link

SparkQA commented Apr 4, 2016

Test build #54875 has finished for PR 12080 at commit f4302bc.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 4, 2016

Test build #54896 has finished for PR 12080 at commit f4debd0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 4, 2016

test this please

@marmbrus
Copy link
Contributor

marmbrus commented Apr 4, 2016

@tdas here is another failure

@SparkQA
Copy link

SparkQA commented Apr 5, 2016

Test build #54907 has finished for PR 12080 at commit f4debd0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Apr 5, 2016

Thanks, merging to master!

@asfgit asfgit closed this in ba24d1e Apr 5, 2016
@brkyvz brkyvz deleted the is-streaming branch February 3, 2019 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants