Skip to content

Commit 156bcd7

Browse files
Aaron Kimballtdas
authored andcommitted
SPARK-1173. Improve scala streaming docs.
Clarify imports to add implicit conversions to DStream and fix other small typos in the streaming intro documentation. Tested by inspecting output via a local jekyll server, c&p'ing the scala commands into a spark terminal. Author: Aaron Kimball <aaron@magnify.io> Closes #64 from kimballa/spark-1173-streaming-docs and squashes the following commits: 6fbff0e [Aaron Kimball] SPARK-1173. Improve scala streaming docs.
1 parent 8849a96 commit 156bcd7

File tree

1 file changed

+33
-5
lines changed

1 file changed

+33
-5
lines changed

docs/streaming-programming-guide.md

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,21 @@ do is as follows.
5858

5959
<div class="codetabs">
6060
<div data-lang="scala" markdown="1" >
61+
First, we import the names of the Spark Streaming classes, and some implicit
62+
conversions from StreamingContext into our environment, to add useful methods to
63+
other classes we need (like DStream).
6164

62-
First, we create a
63-
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object,
64-
which is the main entry point for all streaming
65-
functionality. Besides Spark's configuration, we specify that any DStream will be processed
65+
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) is the
66+
main entry point for all streaming functionality.
67+
68+
{% highlight scala %}
69+
import org.apache.spark.streaming._
70+
import org.apache.spark.streaming.StreamingContext._
71+
{% endhighlight %}
72+
73+
Then we create a
74+
[StreamingContext](api/streaming/index.html#org.apache.spark.streaming.StreamingContext) object.
75+
Besides Spark's configuration, we specify that any DStream will be processed
6676
in 1 second batches.
6777

6878
{% highlight scala %}
@@ -98,7 +108,7 @@ val pairs = words.map(word => (word, 1))
98108
val wordCounts = pairs.reduceByKey(_ + _)
99109

100110
// Print a few of the counts to the console
101-
wordCount.print()
111+
wordCounts.print()
102112
{% endhighlight %}
103113

104114
The `words` DStream is further mapped (one-to-one transformation) to a DStream of `(word,
@@ -262,6 +272,24 @@ Time: 1357008430000 ms
262272
</td>
263273
</table>
264274

275+
If you plan to run the Scala code for Spark Streaming-based use cases in the Spark
276+
shell, you should start the shell with the SparkConfiguration pre-configured to
277+
discard old batches periodically:
278+
279+
{% highlight bash %}
280+
$ SPARK_JAVA_OPTS=-Dspark.cleaner.ttl=10000 bin/spark-shell
281+
{% endhighlight %}
282+
283+
... and create your StreamingContext by wrapping the existing interactive shell
284+
SparkContext object, `sc`:
285+
286+
{% highlight scala %}
287+
val ssc = new StreamingContext(sc, Seconds(1))
288+
{% endhighlight %}
289+
290+
When working with the shell, you may also need to send a `^D` to your netcat session
291+
to force the pipeline to print the word counts to the console at the sink.
292+
265293
***************************************************************************************************
266294

267295
# Basics

0 commit comments

Comments
 (0)