Skip to content

Commit 4dd148f

Browse files
committed
Merge pull request alteryx#5 from apache/master
merge upstream updates
2 parents f7e4581 + 98ab411 commit 4dd148f

File tree

332 files changed

+5995
-2312
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

332 files changed

+5995
-2312
lines changed

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
# Apache Spark
22

3-
Lightning-Fast Cluster Computing - <http://spark.apache.org/>
3+
Spark is a fast and general cluster computing system for Big Data. It provides
4+
high-level APIs in Scala, Java, and Python, and an optimized engine that
5+
supports general computation graphs for data analysis. It also supports a
6+
rich set of higher-level tools including Spark SQL for SQL and structured
7+
data processing, MLLib for machine learning, GraphX for graph processing,
8+
and Spark Streaming.
9+
10+
<http://spark.apache.org/>
411

512

613
## Online Documentation
@@ -69,29 +76,28 @@ can be run using:
6976
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7077
storage systems. Because the protocols have changed in different versions of
7178
Hadoop, you must build Spark against the same version that your cluster runs.
72-
You can change the version by setting the `SPARK_HADOOP_VERSION` environment
73-
when building Spark.
79+
You can change the version by setting `-Dhadoop.version` when building Spark.
7480

7581
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
7682
versions without YARN, use:
7783

7884
# Apache Hadoop 1.2.1
79-
$ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
85+
$ sbt/sbt -Dhadoop.version=1.2.1 assembly
8086

8187
# Cloudera CDH 4.2.0 with MapReduce v1
82-
$ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly
88+
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
8389

8490
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
85-
with YARN, also set `SPARK_YARN=true`:
91+
with YARN, also set `-Pyarn`:
8692

8793
# Apache Hadoop 2.0.5-alpha
88-
$ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
94+
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
8995

9096
# Cloudera CDH 4.2.0 with MapReduce v2
91-
$ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly
97+
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
9298

9399
# Apache Hadoop 2.2.X and newer
94-
$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
100+
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
95101

96102
When developing a Spark application, specify the Hadoop version by adding the
97103
"hadoop-client" artifact to your project's dependencies. For example, if you're

assembly/pom.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
<packaging>pom</packaging>
3333

3434
<properties>
35+
<sbt.project.name>assembly</sbt.project.name>
3536
<spark.jar.dir>scala-${scala.binary.version}</spark.jar.dir>
3637
<spark.jar.basename>spark-assembly-${project.version}-hadoop${hadoop.version}.jar</spark.jar.basename>
3738
<spark.jar>${project.build.directory}/${spark.jar.dir}/${spark.jar.basename}</spark.jar>

bagel/pom.xml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-bagel_2.10</artifactId>
30+
<properties>
31+
<sbt.project.name>bagel</sbt.project.name>
32+
</properties>
3033
<packaging>jar</packaging>
3134
<name>Spark Project Bagel</name>
3235
<url>http://spark.apache.org/</url>

bin/spark-class

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,9 +110,9 @@ export JAVA_OPTS
110110

111111
TOOLS_DIR="$FWDIR"/tools
112112
SPARK_TOOLS_JAR=""
113-
if [ -e "$TOOLS_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar ]; then
113+
if [ -e "$TOOLS_DIR"/target/scala-$SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
114114
# Use the JAR from the SBT build
115-
export SPARK_TOOLS_JAR=`ls "$TOOLS_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar`
115+
export SPARK_TOOLS_JAR=`ls "$TOOLS_DIR"/target/scala-$SCALA_VERSION/spark-tools*[0-9Tg].jar`
116116
fi
117117
if [ -e "$TOOLS_DIR"/target/spark-tools*[0-9Tg].jar ]; then
118118
# Use the JAR from the Maven build

core/pom.xml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-core_2.10</artifactId>
30+
<properties>
31+
<sbt.project.name>core</sbt.project.name>
32+
</properties>
3033
<packaging>jar</packaging>
3134
<name>Spark Project Core</name>
3235
<url>http://spark.apache.org/</url>
@@ -111,6 +114,10 @@
111114
<groupId>org.xerial.snappy</groupId>
112115
<artifactId>snappy-java</artifactId>
113116
</dependency>
117+
<dependency>
118+
<groupId>net.jpountz.lz4</groupId>
119+
<artifactId>lz4</artifactId>
120+
</dependency>
114121
<dependency>
115122
<groupId>com.twitter</groupId>
116123
<artifactId>chill_${scala.binary.version}</artifactId>

core/src/main/resources/org/apache/spark/ui/static/webui.css

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,9 @@ table.sortable thead {
8181

8282
span.kill-link {
8383
margin-right: 2px;
84+
margin-left: 20px;
8485
color: gray;
86+
float: right;
8587
}
8688

8789
span.kill-link a {

core/src/main/scala/org/apache/spark/Aggregator.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ case class Aggregator[K, V, C] (
5656
} else {
5757
val combiners = new ExternalAppendOnlyMap[K, V, C](createCombiner, mergeValue, mergeCombiners)
5858
while (iter.hasNext) {
59-
val (k, v) = iter.next()
60-
combiners.insert(k, v)
59+
val pair = iter.next()
60+
combiners.insert(pair._1, pair._2)
6161
}
6262
// TODO: Make this non optional in a future release
6363
Option(context).foreach(c => c.taskMetrics.memoryBytesSpilled = combiners.memoryBytesSpilled)
@@ -85,8 +85,8 @@ case class Aggregator[K, V, C] (
8585
} else {
8686
val combiners = new ExternalAppendOnlyMap[K, C, C](identity, mergeCombiners, mergeCombiners)
8787
while (iter.hasNext) {
88-
val (k, c) = iter.next()
89-
combiners.insert(k, c)
88+
val pair = iter.next()
89+
combiners.insert(pair._1, pair._2)
9090
}
9191
// TODO: Make this non optional in a future release
9292
Option(context).foreach(c => c.taskMetrics.memoryBytesSpilled = combiners.memoryBytesSpilled)

core/src/main/scala/org/apache/spark/SparkContext.scala

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1531,7 +1531,16 @@ object SparkContext extends Logging {
15311531
throw new SparkException("YARN mode not available ?", e)
15321532
}
15331533
}
1534-
val backend = new CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem)
1534+
val backend = try {
1535+
val clazz =
1536+
Class.forName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")
1537+
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
1538+
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
1539+
} catch {
1540+
case e: Exception => {
1541+
throw new SparkException("YARN mode not available ?", e)
1542+
}
1543+
}
15351544
scheduler.initialize(backend)
15361545
scheduler
15371546

core/src/main/scala/org/apache/spark/TaskEndReason.scala

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ package org.apache.spark
2020
import org.apache.spark.annotation.DeveloperApi
2121
import org.apache.spark.executor.TaskMetrics
2222
import org.apache.spark.storage.BlockManagerId
23+
import org.apache.spark.util.Utils
2324

2425
/**
2526
* :: DeveloperApi ::
@@ -88,10 +89,7 @@ case class ExceptionFailure(
8889
stackTrace: Array[StackTraceElement],
8990
metrics: Option[TaskMetrics])
9091
extends TaskFailedReason {
91-
override def toErrorString: String = {
92-
val stackTraceString = if (stackTrace == null) "null" else stackTrace.mkString("\n")
93-
s"$className ($description}\n$stackTraceString"
94-
}
92+
override def toErrorString: String = Utils.exceptionString(className, description, stackTrace)
9593
}
9694

9795
/**

core/src/main/scala/org/apache/spark/TestUtils.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,8 @@ private[spark] object TestUtils {
9292
def createCompiledClass(className: String, destDir: File, value: String = ""): File = {
9393
val compiler = ToolProvider.getSystemJavaCompiler
9494
val sourceFile = new JavaSourceFromString(className,
95-
"public class " + className + " { @Override public String toString() { " +
96-
"return \"" + value + "\";}}")
95+
"public class " + className + " implements java.io.Serializable {" +
96+
" @Override public String toString() { return \"" + value + "\"; }}")
9797

9898
// Calling this outputs a class file in pwd. It's easier to just rename the file than
9999
// build a custom FileManager that controls the output location.

core/src/main/scala/org/apache/spark/broadcast/Broadcast.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -106,23 +106,23 @@ abstract class Broadcast[T: ClassTag](val id: Long) extends Serializable {
106106
* Actually get the broadcasted value. Concrete implementations of Broadcast class must
107107
* define their own way to get the value.
108108
*/
109-
private[spark] def getValue(): T
109+
protected def getValue(): T
110110

111111
/**
112112
* Actually unpersist the broadcasted value on the executors. Concrete implementations of
113113
* Broadcast class must define their own logic to unpersist their own data.
114114
*/
115-
private[spark] def doUnpersist(blocking: Boolean)
115+
protected def doUnpersist(blocking: Boolean)
116116

117117
/**
118118
* Actually destroy all data and metadata related to this broadcast variable.
119119
* Implementation of Broadcast class must define their own logic to destroy their own
120120
* state.
121121
*/
122-
private[spark] def doDestroy(blocking: Boolean)
122+
protected def doDestroy(blocking: Boolean)
123123

124124
/** Check if this broadcast is valid. If not valid, exception is thrown. */
125-
private[spark] def assertValid() {
125+
protected def assertValid() {
126126
if (!_isValid) {
127127
throw new SparkException("Attempted to use %s after it has been destroyed!".format(toString))
128128
}

core/src/main/scala/org/apache/spark/broadcast/BroadcastManager.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ private[spark] class BroadcastManager(
3939
synchronized {
4040
if (!initialized) {
4141
val broadcastFactoryClass =
42-
conf.get("spark.broadcast.factory", "org.apache.spark.broadcast.HttpBroadcastFactory")
42+
conf.get("spark.broadcast.factory", "org.apache.spark.broadcast.TorrentBroadcastFactory")
4343

4444
broadcastFactory =
4545
Class.forName(broadcastFactoryClass).newInstance.asInstanceOf[BroadcastFactory]

core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,9 @@ private[spark] class HttpBroadcast[T: ClassTag](
4040
@transient var value_ : T, isLocal: Boolean, id: Long)
4141
extends Broadcast[T](id) with Logging with Serializable {
4242

43-
def getValue = value_
43+
override protected def getValue() = value_
4444

45-
val blockId = BroadcastBlockId(id)
45+
private val blockId = BroadcastBlockId(id)
4646

4747
/*
4848
* Broadcasted data is also stored in the BlockManager of the driver. The BlockManagerMaster
@@ -60,14 +60,14 @@ private[spark] class HttpBroadcast[T: ClassTag](
6060
/**
6161
* Remove all persisted state associated with this HTTP broadcast on the executors.
6262
*/
63-
def doUnpersist(blocking: Boolean) {
63+
override protected def doUnpersist(blocking: Boolean) {
6464
HttpBroadcast.unpersist(id, removeFromDriver = false, blocking)
6565
}
6666

6767
/**
6868
* Remove all persisted state associated with this HTTP broadcast on the executors and driver.
6969
*/
70-
def doDestroy(blocking: Boolean) {
70+
override protected def doDestroy(blocking: Boolean) {
7171
HttpBroadcast.unpersist(id, removeFromDriver = true, blocking)
7272
}
7373

@@ -102,7 +102,7 @@ private[spark] class HttpBroadcast[T: ClassTag](
102102
}
103103
}
104104

105-
private[spark] object HttpBroadcast extends Logging {
105+
private[broadcast] object HttpBroadcast extends Logging {
106106
private var initialized = false
107107
private var broadcastDir: File = null
108108
private var compress: Boolean = false
@@ -160,7 +160,7 @@ private[spark] object HttpBroadcast extends Logging {
160160

161161
def getFile(id: Long) = new File(broadcastDir, BroadcastBlockId(id).name)
162162

163-
def write(id: Long, value: Any) {
163+
private def write(id: Long, value: Any) {
164164
val file = getFile(id)
165165
val out: OutputStream = {
166166
if (compress) {
@@ -176,7 +176,7 @@ private[spark] object HttpBroadcast extends Logging {
176176
files += file
177177
}
178178

179-
def read[T: ClassTag](id: Long): T = {
179+
private def read[T: ClassTag](id: Long): T = {
180180
logDebug("broadcast read server: " + serverUri + " id: broadcast-" + id)
181181
val url = serverUri + "/" + BroadcastBlockId(id).name
182182

core/src/main/scala/org/apache/spark/broadcast/HttpBroadcastFactory.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,21 +27,21 @@ import org.apache.spark.{SecurityManager, SparkConf}
2727
* [[org.apache.spark.broadcast.HttpBroadcast]] for more details about this mechanism.
2828
*/
2929
class HttpBroadcastFactory extends BroadcastFactory {
30-
def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager) {
30+
override def initialize(isDriver: Boolean, conf: SparkConf, securityMgr: SecurityManager) {
3131
HttpBroadcast.initialize(isDriver, conf, securityMgr)
3232
}
3333

34-
def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long) =
34+
override def newBroadcast[T: ClassTag](value_ : T, isLocal: Boolean, id: Long) =
3535
new HttpBroadcast[T](value_, isLocal, id)
3636

37-
def stop() { HttpBroadcast.stop() }
37+
override def stop() { HttpBroadcast.stop() }
3838

3939
/**
4040
* Remove all persisted state associated with the HTTP broadcast with the given ID.
4141
* @param removeFromDriver Whether to remove state from the driver
4242
* @param blocking Whether to block until unbroadcasted
4343
*/
44-
def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {
44+
override def unbroadcast(id: Long, removeFromDriver: Boolean, blocking: Boolean) {
4545
HttpBroadcast.unpersist(id, removeFromDriver, blocking)
4646
}
4747
}

0 commit comments

Comments
 (0)