Skip to content

Commit 4dd148f

Browse files
committed
Merge pull request #5 from apache/master
merge upstream updates
2 parents f7e4581 + 98ab411 commit 4dd148f

File tree

332 files changed

+5995
-2312
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

332 files changed

+5995
-2312
lines changed

README.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
# Apache Spark
22

3-
Lightning-Fast Cluster Computing - <http://spark.apache.org/>
3+
Spark is a fast and general cluster computing system for Big Data. It provides
4+
high-level APIs in Scala, Java, and Python, and an optimized engine that
5+
supports general computation graphs for data analysis. It also supports a
6+
rich set of higher-level tools including Spark SQL for SQL and structured
7+
data processing, MLLib for machine learning, GraphX for graph processing,
8+
and Spark Streaming.
9+
10+
<http://spark.apache.org/>
411

512

613
## Online Documentation
@@ -69,29 +76,28 @@ can be run using:
6976
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7077
storage systems. Because the protocols have changed in different versions of
7178
Hadoop, you must build Spark against the same version that your cluster runs.
72-
You can change the version by setting the `SPARK_HADOOP_VERSION` environment
73-
when building Spark.
79+
You can change the version by setting `-Dhadoop.version` when building Spark.
7480

7581
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
7682
versions without YARN, use:
7783

7884
# Apache Hadoop 1.2.1
79-
$ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
85+
$ sbt/sbt -Dhadoop.version=1.2.1 assembly
8086

8187
# Cloudera CDH 4.2.0 with MapReduce v1
82-
$ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly
88+
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
8389

8490
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
85-
with YARN, also set `SPARK_YARN=true`:
91+
with YARN, also set `-Pyarn`:
8692

8793
# Apache Hadoop 2.0.5-alpha
88-
$ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
94+
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
8995

9096
# Cloudera CDH 4.2.0 with MapReduce v2
91-
$ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly
97+
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
9298

9399
# Apache Hadoop 2.2.X and newer
94-
$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
100+
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
95101

96102
When developing a Spark application, specify the Hadoop version by adding the
97103
"hadoop-client" artifact to your project's dependencies. For example, if you're

assembly/pom.xml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
<packaging>pom</packaging>
3333

3434
<properties>
35+
<sbt.project.name>assembly</sbt.project.name>
3536
<spark.jar.dir>scala-${scala.binary.version}</spark.jar.dir>
3637
<spark.jar.basename>spark-assembly-${project.version}-hadoop${hadoop.version}.jar</spark.jar.basename>
3738
<spark.jar>${project.build.directory}/${spark.jar.dir}/${spark.jar.basename}</spark.jar>

bagel/pom.xml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-bagel_2.10</artifactId>
30+
<properties>
31+
<sbt.project.name>bagel</sbt.project.name>
32+
</properties>
3033
<packaging>jar</packaging>
3134
<name>Spark Project Bagel</name>
3235
<url>http://spark.apache.org/</url>

bin/spark-class

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,9 +110,9 @@ export JAVA_OPTS
110110

111111
TOOLS_DIR="$FWDIR"/tools
112112
SPARK_TOOLS_JAR=""
113-
if [ -e "$TOOLS_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar ]; then
113+
if [ -e "$TOOLS_DIR"/target/scala-$SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
114114
# Use the JAR from the SBT build
115-
export SPARK_TOOLS_JAR=`ls "$TOOLS_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar`
115+
export SPARK_TOOLS_JAR=`ls "$TOOLS_DIR"/target/scala-$SCALA_VERSION/spark-tools*[0-9Tg].jar`
116116
fi
117117
if [ -e "$TOOLS_DIR"/target/spark-tools*[0-9Tg].jar ]; then
118118
# Use the JAR from the Maven build

core/pom.xml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-core_2.10</artifactId>
30+
<properties>
31+
<sbt.project.name>core</sbt.project.name>
32+
</properties>
3033
<packaging>jar</packaging>
3134
<name>Spark Project Core</name>
3235
<url>http://spark.apache.org/</url>
@@ -111,6 +114,10 @@
111114
<groupId>org.xerial.snappy</groupId>
112115
<artifactId>snappy-java</artifactId>
113116
</dependency>
117+
<dependency>
118+
<groupId>net.jpountz.lz4</groupId>
119+
<artifactId>lz4</artifactId>
120+
</dependency>
114121
<dependency>
115122
<groupId>com.twitter</groupId>
116123
<artifactId>chill_${scala.binary.version}</artifactId>

core/src/main/resources/org/apache/spark/ui/static/webui.css

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,9 @@ table.sortable thead {
8181

8282
span.kill-link {
8383
margin-right: 2px;
84+
margin-left: 20px;
8485
color: gray;
86+
float: right;
8587
}
8688

8789
span.kill-link a {

core/src/main/scala/org/apache/spark/Aggregator.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,8 @@ case class Aggregator[K, V, C] (
5656
} else {
5757
val combiners = new ExternalAppendOnlyMap[K, V, C](createCombiner, mergeValue, mergeCombiners)
5858
while (iter.hasNext) {
59-
val (k, v) = iter.next()
60-
combiners.insert(k, v)
59+
val pair = iter.next()
60+
combiners.insert(pair._1, pair._2)
6161
}
6262
// TODO: Make this non optional in a future release
6363
Option(context).foreach(c => c.taskMetrics.memoryBytesSpilled = combiners.memoryBytesSpilled)
@@ -85,8 +85,8 @@ case class Aggregator[K, V, C] (
8585
} else {
8686
val combiners = new ExternalAppendOnlyMap[K, C, C](identity, mergeCombiners, mergeCombiners)
8787
while (iter.hasNext) {
88-
val (k, c) = iter.next()
89-
combiners.insert(k, c)
88+
val pair = iter.next()
89+
combiners.insert(pair._1, pair._2)
9090
}
9191
// TODO: Make this non optional in a future release
9292
Option(context).foreach(c => c.taskMetrics.memoryBytesSpilled = combiners.memoryBytesSpilled)

core/src/main/scala/org/apache/spark/SparkContext.scala

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1531,7 +1531,16 @@ object SparkContext extends Logging {
15311531
throw new SparkException("YARN mode not available ?", e)
15321532
}
15331533
}
1534-
val backend = new CoarseGrainedSchedulerBackend(scheduler, sc.env.actorSystem)
1534+
val backend = try {
1535+
val clazz =
1536+
Class.forName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")
1537+
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
1538+
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
1539+
} catch {
1540+
case e: Exception => {
1541+
throw new SparkException("YARN mode not available ?", e)
1542+
}
1543+
}
15351544
scheduler.initialize(backend)
15361545
scheduler
15371546

core/src/main/scala/org/apache/spark/TaskEndReason.scala

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ package org.apache.spark
2020
import org.apache.spark.annotation.DeveloperApi
2121
import org.apache.spark.executor.TaskMetrics
2222
import org.apache.spark.storage.BlockManagerId
23+
import org.apache.spark.util.Utils
2324

2425
/**
2526
* :: DeveloperApi ::
@@ -88,10 +89,7 @@ case class ExceptionFailure(
8889
stackTrace: Array[StackTraceElement],
8990
metrics: Option[TaskMetrics])
9091
extends TaskFailedReason {
91-
override def toErrorString: String = {
92-
val stackTraceString = if (stackTrace == null) "null" else stackTrace.mkString("\n")
93-
s"$className ($description}\n$stackTraceString"
94-
}
92+
override def toErrorString: String = Utils.exceptionString(className, description, stackTrace)
9593
}
9694

9795
/**

core/src/main/scala/org/apache/spark/TestUtils.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,8 @@ private[spark] object TestUtils {
9292
def createCompiledClass(className: String, destDir: File, value: String = ""): File = {
9393
val compiler = ToolProvider.getSystemJavaCompiler
9494
val sourceFile = new JavaSourceFromString(className,
95-
"public class " + className + " { @Override public String toString() { " +
96-
"return \"" + value + "\";}}")
95+
"public class " + className + " implements java.io.Serializable {" +
96+
" @Override public String toString() { return \"" + value + "\"; }}")
9797

9898
// Calling this outputs a class file in pwd. It's easier to just rename the file than
9999
// build a custom FileManager that controls the output location.

0 commit comments

Comments
 (0)