[WIP] Adds back-pressure based congestion handling and a Reactive Streams endpoint to Spark Streaming #13

huitseeker · 2015-06-12T09:36:28Z

Follows and supersedes #11, #9.

dragos · 2015-06-12T12:05:58Z

streaming/src/main/scala/org/apache/spark/streaming/receiver/ReactiveReceiver.scala

+import java.lang.IllegalStateException
+
+@DeveloperApi
+abstract class ReactiveReceiver[T](storageLevel: StorageLevel)


The code looks good, but since this is Developer API you should add some docs, especially on methods that are overridable (def when*).

dragos · 2015-06-12T12:38:20Z

streaming/src/main/scala/org/apache/spark/streaming/receiver/CongestionStrategyImpl.scala

+                                     nextBuffer: ArrayBuffer[Any]): Unit = {
+    val bound = latestBound.get()
+    val f = bound.toDouble / currentBuffer.size
+    val samplees = currentBuffer.to


I'm a bit worried about the number of copies we're making. According to scaladoc:

Converts this array buffer into another by copying all elements.

Also, it'd be good to be explicit about the destination type.

huitseeker · 2015-06-15T08:39:46Z

~~Note to self : log dropping rate in destructive strategies.~~

dragos · 2015-06-15T11:43:26Z

streaming/src/main/scala/org/apache/spark/streaming/receiver/CongestionStrategyImpl.scala

+    val bound = latestBound.get()
+    val f = bound.toDouble / currentBuffer.size
+    if (f > 0 && f < 1){
+      val samplees = currentBuffer.to


Same concern about copying (and would like to use an explicit type, like .to[IndexedSeq[T]]).

'Same concern' << where was the other instance ?

Exactly at this point. Github doesn't show my old comment anymore, maybe due to some git commit reshuffling since 3 days ago.

Here it is. It's collapsed because GitHub thinks it's outdated.

OK! So, the currentBuffer.toIterator is lazy, so is BernouilliSampler(...).sample(...). So that, if I don't make a copy of the buffer once (in samplees), by the time I call currentBuffer.clear(), I've lost all elements in sampled.

I refactored this part to make things clearer (and copy after sampling rather than before) in 92767f5 (the copy is made in the toArray).

dragos · 2015-06-17T08:54:06Z

Could you please fix the style check failures?

Scalastyle checks failed at following occurrences:
[error] /home/ubuntu/workspace/ghprb-spark-multi-conf/label/Spark-Ora-JDK7-PV/scala_version/2.10/streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala:233: File line length exceeds 100 characters
[error] /home/ubuntu/workspace/ghprb-spark-multi-conf/label/Spark-Ora-JDK7-PV/scala_version/2.10/streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala:235: File line length exceeds 100 characters
[error] /home/ubuntu/workspace/ghprb-spark-multi-conf/label/Spark-Ora-JDK7-PV/scala_version/2.10/streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala:233:27: No space after token :
[error] (streaming/test:scalastyle) errors exist
[error] Total time: 11 s, completed Jun 17, 2015 5:03:22 AM
[error] /home/ubuntu/workspace/ghprb-spark-multi-conf/label/Spark-Ora-JDK7-PV/scala_version/2.10/streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala:233: File line length exceeds 100 characters
[error] /home/ubuntu/workspace/ghprb-spark-multi-conf/label/Spark-Ora-JDK7-PV/scala_version/2.10/streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala:235: File line length exceeds 100 characters
[error] /home/ubuntu/workspace/ghprb-spark-multi-conf/label/Spark-Ora-JDK7-PV/scala_version/2.10/streaming/src/test/scala/org/apache/spark/streaming/ReceiverSuite.scala:233:27: No space after token :
[error] (streaming/test:scalastyle) errors exist

typesafe-tools · 2015-06-18T03:53:49Z

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/30/

Build Log
last 10 lines

[...truncated 20 lines...]
First time build. Skipping changelog.
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10
Triggering ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.11
Configuration ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 is still in the queue: Waiting for next available executor on Spark JDK-7 PV (i-bdc6216e)
Configuration ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 is still in the queue: Waiting for next available executor on Spark-Ora-JDK7-PV
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.10 completed with result FAILURE
ghprb-spark-multi-conf » Spark-Ora-JDK7-PV,2.11 completed with result FAILURE
Notifying upstream projects of job completion
Setting status of 54dab47ff422fbdcbe24796702f49bf103d909df to FAILURE with url http://ci.typesafe.com/job/ghprb-spark-multi-conf/30/ and message: Merged build finished.

Test FAILed.

huitseeker · 2015-06-18T13:34:18Z

The bot is really smoking something, whether on 2.11 or 2.10.

adds a basic Listener test: In two (close) listener configurations, the speed of processing should remain (roughly) constant.

Equipped CongestionStrategies with default Ignore

Bumping numbers: test is stable enough for repeated running when num of elements is > 10.

plugged them into Block Generation

…some platforms Those test don't react well to low-memory situations (very quick data generation).

typesafe-tools · 2015-06-19T05:47:46Z

Refer to this link for build results (access rights to CI server needed):
https://ci.typesafe.com/job/ghprb-spark-multi-conf/32/
Test PASSed.

dragos · 2015-06-19T08:29:18Z

I've seen the SQL tests fail before, seems there's some flakiness in the code or our setup. Finally it passed (I disabled the 2.11 builds because the mima step in their build is 2.10 specific. I should get around to fix that at some point, but the priority is low...)

viktorklang · 2015-06-23T09:30:19Z

streaming/src/main/scala/org/apache/spark/streaming/scheduler/StreamingListener.scala

+  }
+
+  def getSpeedForStreamId(streamId: Int): Option[Long] = {
+    streamIdToElemsPerBatch.flatMap(_.get(streamId))


What provides the happens-before guarantee on reading streamIdToElemsPerBatch?

I don't see any threading guarantees, so if this code is guaranteed to run in a single thread, this is not a problem. Otherwise, it's not thread safe and may miss some updates, so a concurrent map or synchronization would be needed.

If this code only runs on one thread then the synchronized on line 97 in this file & changeset is misleading.
If it doesn't, then this class needs, as you say, a more cohesive thread safety story.

Agreed. After reading the code around it, it's definitely called from multiple threads (also, what that synchronized call implies). Thanks for catching this.

Glad I could help!

Crap ! Thanks, Viktor.

…into a single batch. SQL ``` select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e) ``` Plan before modify ``` == Optimized Logical Plan == Project [a#293,b#294,c#295,d#296,e#297] Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297)))) MetastoreRelation default, tablea, None MetastoreRelation default, tableb, None ``` Plan after modify ``` == Optimized Logical Plan == Project [a#293,b#294,c#295,d#296,e#297] Join Inner, Some(((b#294 = d#296) || (b#294 = e#297))) Filter (a#293 > 3) MetastoreRelation default, tablea, None MetastoreRelation default, tableb, None ``` CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification , so CombineLimits must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin. Author: Zhongshuai Pei <799203320@qq.com> Author: DoingDone9 <799203320@qq.com> Closes apache#6351 from DoingDone9/master and squashes the following commits: 20de7be [Zhongshuai Pei] Update Optimizer.scala 7bc7d28 [Zhongshuai Pei] Merge pull request #17 from apache/master 0ba5f42 [Zhongshuai Pei] Update Optimizer.scala f8b9314 [Zhongshuai Pei] Update FilterPushdownSuite.scala c529d9f [Zhongshuai Pei] Update FilterPushdownSuite.scala ae3af6d [Zhongshuai Pei] Update FilterPushdownSuite.scala a04ffae [Zhongshuai Pei] Update Optimizer.scala 11beb61 [Zhongshuai Pei] Update FilterPushdownSuite.scala f2ee5fe [Zhongshuai Pei] Update Optimizer.scala be6b1d5 [Zhongshuai Pei] Update Optimizer.scala b01e622 [Zhongshuai Pei] Merge pull request #15 from apache/master 8df716a [Zhongshuai Pei] Update FilterPushdownSuite.scala d98bc35 [Zhongshuai Pei] Update FilterPushdownSuite.scala fa65718 [Zhongshuai Pei] Update Optimizer.scala ab8e9a6 [Zhongshuai Pei] Merge pull request #14 from apache/master 14952e2 [Zhongshuai Pei] Merge pull request #13 from apache/master f03fe7f [Zhongshuai Pei] Merge pull request #12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request #10 from apache/master f61210c [Zhongshuai Pei] Merge pull request #9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request #8 from apache/master 802261c [DoingDone9] Merge pull request #7 from apache/master d00303b [DoingDone9] Merge pull request #6 from apache/master 98b134f [DoingDone9] Merge pull request #5 from apache/master 161cae3 [DoingDone9] Merge pull request #4 from apache/master c87e8b6 [DoingDone9] Merge pull request #3 from apache/master cb1852d [DoingDone9] Merge pull request #2 from apache/master c3f046f [DoingDone9] Merge pull request #1 from apache/master

deanwampler · 2015-07-03T13:51:40Z

streaming/src/main/scala/org/apache/spark/streaming/receiver/ReactiveReceiver.scala

+      }
+    }
+
+    private case class CancelException(s: Subscription) extends SpecificationViolation {


Textual logs are harder to analyze with messages on multiple lines. Can we make this and the subsequent messages a single line?

I think they all are (there's a replaceAll("\n", "") at the end of the line)

dragos reviewed Jun 12, 2015
View reviewed changes

huitseeker force-pushed the ReactiveStreamingBackPressureControl branch from 1b963c2 to 8baec36 Compare June 12, 2015 12:25

dragos reviewed Jun 12, 2015
View reviewed changes

dragos reviewed Jun 15, 2015
View reviewed changes

huitseeker force-pushed the ReactiveStreamingBackPressureControl branch 2 times, most recently from 6d2913f to bc3de77 Compare June 17, 2015 04:14

huitseeker mentioned this pull request Jun 17, 2015

Measures back-pressure on the job queue, and exerts control strategies at Block Generation #11

Closed

huitseeker changed the title ~~Adds back-pressure based congestion handling and a Reactive Streams endpoint to Spark Streaming~~ [WIP] Adds back-pressure based congestion handling and a Reactive Streams endpoint to Spark Streaming Jun 17, 2015

huitseeker force-pushed the ReactiveStreamingBackPressureControl branch from bc3de77 to 54dab47 Compare June 17, 2015 22:12

huitseeker force-pushed the ReactiveStreamingBackPressureControl branch from 54dab47 to 1587e42 Compare June 18, 2015 13:29

huitseeker added 13 commits June 18, 2015 21:30

Adds a LatestSpeed(Streaming)Listener for processing speed computation

5975632

adds a basic Listener test: In two (close) listener configurations, the speed of processing should remain (roughly) constant.

Exercising the receiver's endpoint.

9345f17

Trigger sending of speed updates

f1fb911

Add debug message for number of elements per batch

7943250

Adds a CongestionStrategy to customize control.

9221ef8

Equipped CongestionStrategies with default Ignore

Putting test in ignore because of long running time

af130b9

Bumping numbers: test is stable enough for repeated running when num of elements is > 10.

Added congestion control strategies

67bfc03

plugged them into Block Generation

Switch pushBack Strategy to throttling

e61a82c

Making RNG more efficient for Bernoulli usage in sampling

2373489

Introduce integration tests for congestion strategies

3b8622e

Fold "block generator throttling" in test framework.

4fef476

Putting tests in ignore because of their undeterministic behavior in …

21d53f6

…some platforms Those test don't react well to low-memory situations (very quick data generation).

Adds logging of ratio for destructive strategies

5b39d95

Adds ReactiveReceiver abstract class

e38edfb

huitseeker force-pushed the ReactiveStreamingBackPressureControl branch from 1587e42 to e38edfb Compare June 18, 2015 19:31

viktorklang reviewed Jun 23, 2015
View reviewed changes

deanwampler reviewed Jul 3, 2015
View reviewed changes

huitseeker mentioned this pull request Jul 10, 2015

[SPARK-8834] Add backpressure-based dynamic throttling to Spark Streaming #14

Closed

dragos closed this Sep 9, 2015

[WIP] Adds back-pressure based congestion handling and a Reactive Streams endpoint to Spark Streaming #13

[WIP] Adds back-pressure based congestion handling and a Reactive Streams endpoint to Spark Streaming #13

Uh oh!

Conversation

huitseeker commented Jun 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huitseeker commented Jun 15, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dragos commented Jun 17, 2015

Uh oh!

typesafe-tools commented Jun 18, 2015

Uh oh!

huitseeker commented Jun 18, 2015

Uh oh!

typesafe-tools commented Jun 19, 2015

Uh oh!

dragos commented Jun 19, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!