Skip to content

Commit 2061cf5

Browse files
committed
merged from master
2 parents 06b1690 + 40a8fef commit 2061cf5

File tree

615 files changed

+25363
-8946
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

615 files changed

+25363
-8946
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
sbt/*.jar
88
.settings
99
.cache
10-
.mima-excludes
10+
.generated-mima*
1111
/build/
1212
work/
1313
out/

.rat-excludes

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ target
33
.project
44
.classpath
55
.mima-excludes
6+
.generated-mima-excludes
7+
.generated-mima-class-excludes
8+
.generated-mima-member-excludes
69
.rat-excludes
710
.*md
811
derby.log
@@ -19,8 +22,11 @@ slaves
1922
spark-env.sh
2023
spark-env.sh.template
2124
log4j-defaults.properties
25+
bootstrap-tooltip.js
26+
jquery-1.11.1.min.js
2227
sorttable.js
2328
.*txt
29+
.*json
2430
.*data
2531
.*log
2632
cloudpickle.py

README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -69,29 +69,28 @@ can be run using:
6969
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7070
storage systems. Because the protocols have changed in different versions of
7171
Hadoop, you must build Spark against the same version that your cluster runs.
72-
You can change the version by setting the `SPARK_HADOOP_VERSION` environment
73-
when building Spark.
72+
You can change the version by setting `-Dhadoop.version` when building Spark.
7473

7574
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
7675
versions without YARN, use:
7776

7877
# Apache Hadoop 1.2.1
79-
$ SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
78+
$ sbt/sbt -Dhadoop.version=1.2.1 assembly
8079

8180
# Cloudera CDH 4.2.0 with MapReduce v1
82-
$ SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.2.0 sbt/sbt assembly
81+
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
8382

8483
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
8584
with YARN, also set `SPARK_YARN=true`:
8685

8786
# Apache Hadoop 2.0.5-alpha
88-
$ SPARK_HADOOP_VERSION=2.0.5-alpha SPARK_YARN=true sbt/sbt assembly
87+
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
8988

9089
# Cloudera CDH 4.2.0 with MapReduce v2
91-
$ SPARK_HADOOP_VERSION=2.0.0-cdh4.2.0 SPARK_YARN=true sbt/sbt assembly
90+
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
9291

9392
# Apache Hadoop 2.2.X and newer
94-
$ SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
93+
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
9594

9695
When developing a Spark application, specify the Hadoop version by adding the
9796
"hadoop-client" artifact to your project's dependencies. For example, if you're

assembly/pom.xml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>1.0.0-SNAPSHOT</version>
24+
<version>1.1.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

@@ -32,6 +32,7 @@
3232
<packaging>pom</packaging>
3333

3434
<properties>
35+
<sbt.project.name>assembly</sbt.project.name>
3536
<spark.jar.dir>scala-${scala.binary.version}</spark.jar.dir>
3637
<spark.jar.basename>spark-assembly-${project.version}-hadoop${hadoop.version}.jar</spark.jar.basename>
3738
<spark.jar>${project.build.directory}/${spark.jar.dir}/${spark.jar.basename}</spark.jar>

bagel/pom.xml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,15 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>1.0.0-SNAPSHOT</version>
24+
<version>1.1.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-bagel_2.10</artifactId>
30+
<properties>
31+
<sbt.project.name>bagel</sbt.project.name>
32+
</properties>
3033
<packaging>jar</packaging>
3134
<name>Spark Project Bagel</name>
3235
<url>http://spark.apache.org/</url>

bagel/src/test/scala/org/apache/spark/bagel/BagelSuite.scala

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,6 @@ class BagelSuite extends FunSuite with Assertions with BeforeAndAfter with Timeo
3838
sc.stop()
3939
sc = null
4040
}
41-
// To avoid Akka rebinding to the same port, since it doesn't unbind immediately on shutdown
42-
System.clearProperty("spark.driver.port")
4341
}
4442

4543
test("halting by voting") {
@@ -82,7 +80,7 @@ class BagelSuite extends FunSuite with Assertions with BeforeAndAfter with Timeo
8280
test("large number of iterations") {
8381
// This tests whether jobs with a large number of iterations finish in a reasonable time,
8482
// because non-memoized recursion in RDD or DAGScheduler used to cause them to hang
85-
failAfter(10 seconds) {
83+
failAfter(30 seconds) {
8684
sc = new SparkContext("local", "test")
8785
val verts = sc.parallelize((1 to 4).map(id => (id.toString, new TestVertex(true, 0))))
8886
val msgs = sc.parallelize(Array[(String, TestMessage)]())
@@ -103,7 +101,7 @@ class BagelSuite extends FunSuite with Assertions with BeforeAndAfter with Timeo
103101
sc = new SparkContext("local", "test")
104102
val verts = sc.parallelize((1 to 4).map(id => (id.toString, new TestVertex(true, 0))))
105103
val msgs = sc.parallelize(Array[(String, TestMessage)]())
106-
val numSupersteps = 50
104+
val numSupersteps = 20
107105
val result =
108106
Bagel.run(sc, verts, msgs, sc.defaultParallelism, StorageLevel.DISK_ONLY) {
109107
(self: TestVertex, msgs: Option[Array[TestMessage]], superstep: Int) =>

bin/compute-classpath.sh

Lines changed: 29 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,10 @@ else
3838
JAR_CMD="jar"
3939
fi
4040

41-
# First check if we have a dependencies jar. If so, include binary classes with the deps jar
42-
if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then
41+
# A developer option to prepend more recently compiled Spark classes
42+
if [ -n "$SPARK_PREPEND_CLASSES" ]; then
43+
echo "NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark"\
44+
"classes ahead of assembly." >&2
4345
CLASSPATH="$CLASSPATH:$FWDIR/core/target/scala-$SCALA_VERSION/classes"
4446
CLASSPATH="$CLASSPATH:$FWDIR/repl/target/scala-$SCALA_VERSION/classes"
4547
CLASSPATH="$CLASSPATH:$FWDIR/mllib/target/scala-$SCALA_VERSION/classes"
@@ -51,24 +53,38 @@ if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then
5153
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
5254
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"
5355
CLASSPATH="$CLASSPATH:$FWDIR/yarn/stable/target/scala-$SCALA_VERSION/classes"
56+
fi
5457

55-
ASSEMBLY_JAR=$(ls "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar 2>/dev/null)
58+
# Use spark-assembly jar from either RELEASE or assembly directory
59+
if [ -f "$FWDIR/RELEASE" ]; then
60+
assembly_folder="$FWDIR"/lib
5661
else
57-
# Else use spark-assembly jar from either RELEASE or assembly directory
58-
if [ -f "$FWDIR/RELEASE" ]; then
59-
ASSEMBLY_JAR=$(ls "$FWDIR"/lib/spark-assembly*hadoop*.jar 2>/dev/null)
60-
else
61-
ASSEMBLY_JAR=$(ls "$ASSEMBLY_DIR"/spark-assembly*hadoop*.jar 2>/dev/null)
62-
fi
62+
assembly_folder="$ASSEMBLY_DIR"
6363
fi
6464

65+
num_jars=$(ls "$assembly_folder" | grep "spark-assembly.*hadoop.*\.jar" | wc -l)
66+
if [ "$num_jars" -eq "0" ]; then
67+
echo "Failed to find Spark assembly in $assembly_folder"
68+
echo "You need to build Spark before running this program."
69+
exit 1
70+
fi
71+
if [ "$num_jars" -gt "1" ]; then
72+
jars_list=$(ls "$assembly_folder" | grep "spark-assembly.*hadoop.*.jar")
73+
echo "Found multiple Spark assembly jars in $assembly_folder:"
74+
echo "$jars_list"
75+
echo "Please remove all but one jar."
76+
exit 1
77+
fi
78+
79+
ASSEMBLY_JAR=$(ls "$assembly_folder"/spark-assembly*hadoop*.jar 2>/dev/null)
80+
6581
# Verify that versions of java used to build the jars and run Spark are compatible
6682
jar_error_check=$("$JAR_CMD" -tf "$ASSEMBLY_JAR" nonexistent/class/path 2>&1)
6783
if [[ "$jar_error_check" =~ "invalid CEN header" ]]; then
68-
echo "Loading Spark jar with '$JAR_CMD' failed. "
69-
echo "This is likely because Spark was compiled with Java 7 and run "
70-
echo "with Java 6. (see SPARK-1703). Please use Java 7 to run Spark "
71-
echo "or build Spark with Java 6."
84+
echo "Loading Spark jar with '$JAR_CMD' failed. " 1>&2
85+
echo "This is likely because Spark was compiled with Java 7 and run " 1>&2
86+
echo "with Java 6. (see SPARK-1703). Please use Java 7 to run Spark " 1>&2
87+
echo "or build Spark with Java 6." 1>&2
7288
exit 1
7389
fi
7490

bin/pyspark

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ export SPARK_HOME="$FWDIR"
2626
SCALA_VERSION=2.10
2727

2828
if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
29-
echo "Usage: ./bin/pyspark [options]"
29+
echo "Usage: ./bin/pyspark [options]" 1>&2
3030
$FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
3131
exit 0
3232
fi
@@ -36,16 +36,16 @@ if [ ! -f "$FWDIR/RELEASE" ]; then
3636
# Exit if the user hasn't compiled Spark
3737
ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/spark-assembly*hadoop*.jar >& /dev/null
3838
if [[ $? != 0 ]]; then
39-
echo "Failed to find Spark assembly in $FWDIR/assembly/target" >&2
40-
echo "You need to build Spark before running this program" >&2
39+
echo "Failed to find Spark assembly in $FWDIR/assembly/target" 1>&2
40+
echo "You need to build Spark before running this program" 1>&2
4141
exit 1
4242
fi
4343
fi
4444

4545
. $FWDIR/bin/load-spark-env.sh
4646

4747
# Figure out which Python executable to use
48-
if [ -z "$PYSPARK_PYTHON" ] ; then
48+
if [[ -z "$PYSPARK_PYTHON" ]]; then
4949
PYSPARK_PYTHON="python"
5050
fi
5151
export PYSPARK_PYTHON
@@ -59,7 +59,7 @@ export OLD_PYTHONSTARTUP=$PYTHONSTARTUP
5959
export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py
6060

6161
# If IPython options are specified, assume user wants to run IPython
62-
if [ -n "$IPYTHON_OPTS" ]; then
62+
if [[ -n "$IPYTHON_OPTS" ]]; then
6363
IPYTHON=1
6464
fi
6565

@@ -76,6 +76,16 @@ for i in "$@"; do
7676
done
7777
export PYSPARK_SUBMIT_ARGS
7878

79+
# For pyspark tests
80+
if [[ -n "$SPARK_TESTING" ]]; then
81+
if [[ -n "$PYSPARK_DOC_TEST" ]]; then
82+
exec "$PYSPARK_PYTHON" -m doctest $1
83+
else
84+
exec "$PYSPARK_PYTHON" $1
85+
fi
86+
exit
87+
fi
88+
7989
# If a python file is provided, directly run spark-submit.
8090
if [[ "$1" =~ \.py$ ]]; then
8191
echo -e "\nWARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0." 1>&2

bin/run-example

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,9 +27,9 @@ if [ -n "$1" ]; then
2727
EXAMPLE_CLASS="$1"
2828
shift
2929
else
30-
echo "Usage: ./bin/run-example <example-class> [example-args]"
31-
echo " - set MASTER=XX to use a specific master"
32-
echo " - can use abbreviated example class name (e.g. SparkPi, mllib.LinearRegression)"
30+
echo "Usage: ./bin/run-example <example-class> [example-args]" 1>&2
31+
echo " - set MASTER=XX to use a specific master" 1>&2
32+
echo " - can use abbreviated example class name (e.g. SparkPi, mllib.LinearRegression)" 1>&2
3333
exit 1
3434
fi
3535

@@ -40,8 +40,8 @@ elif [ -e "$EXAMPLES_DIR"/target/scala-$SCALA_VERSION/spark-examples-*hadoop*.ja
4040
fi
4141

4242
if [[ -z $SPARK_EXAMPLES_JAR ]]; then
43-
echo "Failed to find Spark examples assembly in $FWDIR/lib or $FWDIR/examples/target" >&2
44-
echo "You need to build Spark before running this program" >&2
43+
echo "Failed to find Spark examples assembly in $FWDIR/lib or $FWDIR/examples/target" 1>&2
44+
echo "You need to build Spark before running this program" 1>&2
4545
exit 1
4646
fi
4747

@@ -51,7 +51,7 @@ if [[ ! $EXAMPLE_CLASS == org.apache.spark.examples* ]]; then
5151
EXAMPLE_CLASS="org.apache.spark.examples.$EXAMPLE_CLASS"
5252
fi
5353

54-
./bin/spark-submit \
54+
"$FWDIR"/bin/spark-submit \
5555
--master $EXAMPLE_MASTER \
5656
--class $EXAMPLE_CLASS \
5757
"$SPARK_EXAMPLES_JAR" \

bin/spark-class

Lines changed: 13 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,13 @@ export SPARK_HOME="$FWDIR"
3333
. $FWDIR/bin/load-spark-env.sh
3434

3535
if [ -z "$1" ]; then
36-
echo "Usage: spark-class <class> [<args>]" >&2
36+
echo "Usage: spark-class <class> [<args>]" 1>&2
3737
exit 1
3838
fi
3939

4040
if [ -n "$SPARK_MEM" ]; then
41-
echo "Warning: SPARK_MEM is deprecated, please use a more specific config option"
42-
echo "(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY)."
41+
echo -e "Warning: SPARK_MEM is deprecated, please use a more specific config option" 1>&2
42+
echo -e "(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY)." 1>&2
4343
fi
4444

4545
# Use SPARK_MEM or 512m as the default memory, to be overridden by specific options
@@ -108,28 +108,11 @@ fi
108108
export JAVA_OPTS
109109
# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala!
110110

111-
if [ ! -f "$FWDIR/RELEASE" ]; then
112-
# Exit if the user hasn't compiled Spark
113-
num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar" | wc -l)
114-
jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar")
115-
if [ "$num_jars" -eq "0" ]; then
116-
echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
117-
echo "You need to build Spark before running this program." >&2
118-
exit 1
119-
fi
120-
if [ "$num_jars" -gt "1" ]; then
121-
echo "Found multiple Spark assembly jars in $FWDIR/assembly/target/scala-$SCALA_VERSION:" >&2
122-
echo "$jars_list"
123-
echo "Please remove all but one jar."
124-
exit 1
125-
fi
126-
fi
127-
128111
TOOLS_DIR="$FWDIR"/tools
129112
SPARK_TOOLS_JAR=""
130-
if [ -e "$TOOLS_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar ]; then
113+
if [ -e "$TOOLS_DIR"/target/scala-$SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
131114
# Use the JAR from the SBT build
132-
export SPARK_TOOLS_JAR=`ls "$TOOLS_DIR"/target/scala-$SCALA_VERSION/*assembly*[0-9Tg].jar`
115+
export SPARK_TOOLS_JAR=`ls "$TOOLS_DIR"/target/scala-$SCALA_VERSION/spark-tools*[0-9Tg].jar`
133116
fi
134117
if [ -e "$TOOLS_DIR"/target/spark-tools*[0-9Tg].jar ]; then
135118
# Use the JAR from the Maven build
@@ -147,6 +130,11 @@ else
147130
fi
148131

149132
if [[ "$1" =~ org.apache.spark.tools.* ]]; then
133+
if test -z "$SPARK_TOOLS_JAR"; then
134+
echo "Failed to find Spark Tools Jar in $FWDIR/tools/target/scala-$SCALA_VERSION/" 1>&2
135+
echo "You need to build spark before running $1." 1>&2
136+
exit 1
137+
fi
150138
CLASSPATH="$CLASSPATH:$SPARK_TOOLS_JAR"
151139
fi
152140

@@ -159,10 +147,9 @@ fi
159147
export CLASSPATH
160148

161149
if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
162-
echo -n "Spark Command: "
163-
echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"
164-
echo "========================================"
165-
echo
150+
echo -n "Spark Command: " 1>&2
151+
echo "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@" 1>&2
152+
echo -e "========================================\n" 1>&2
166153
fi
167154

168155
exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"

bin/spark-submit

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,5 +41,5 @@ if [ -n "$DRIVER_MEMORY" ] && [ $DEPLOY_MODE == "client" ]; then
4141
export SPARK_DRIVER_MEMORY=$DRIVER_MEMORY
4242
fi
4343

44-
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit "${ORIG_ARGS[@]}"
44+
exec $SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit "${ORIG_ARGS[@]}"
4545

conf/log4j.properties.template

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,6 @@ log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}:
77

88
# Settings to quiet third party logs that are too verbose
99
log4j.logger.org.eclipse.jetty=WARN
10+
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
1011
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
1112
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

0 commit comments

Comments
 (0)