Skip to content

Commit e33b978

Browse files
committed
Merge pull request #2 from apache/master
merge upstream changes
2 parents 0b47ca4 + 5044e49 commit e33b978

File tree

55 files changed

+1549
-302
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+1549
-302
lines changed

.gitignore

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,10 @@ out/
1515
third_party/libmesos.so
1616
third_party/libmesos.dylib
1717
conf/java-opts
18-
conf/spark-env.sh
19-
conf/streaming-env.sh
20-
conf/log4j.properties
21-
conf/spark-defaults.conf
22-
conf/hive-site.xml
18+
conf/*.sh
19+
conf/*.properties
20+
conf/*.conf
21+
conf/*.xml
2322
docs/_site
2423
docs/api
2524
target/
@@ -50,7 +49,6 @@ unit-tests.log
5049
/lib/
5150
rat-results.txt
5251
scalastyle.txt
53-
conf/*.conf
5452
scalastyle-output.xml
5553

5654
# For Hive

CONTRIBUTING.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
## Contributing to Spark
2+
3+
Contributions via GitHub pull requests are gladly accepted from their original
4+
author. Along with any pull requests, please state that the contribution is
5+
your original work and that you license the work to the project under the
6+
project's open source license. Whether or not you state this explicitly, by
7+
submitting any copyrighted material via pull request, email, or other means
8+
you agree to license the material under the project's open source license and
9+
warrant that you have the legal authority to do so.
10+
11+
Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
12+
for more information.

README.md

Lines changed: 16 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,19 @@ and Spark Streaming for stream processing.
1313
## Online Documentation
1414

1515
You can find the latest Spark documentation, including a programming
16-
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
16+
guide, on the [project web page](http://spark.apache.org/documentation.html).
1717
This README file only contains basic setup instructions.
1818

1919
## Building Spark
2020

21-
Spark is built on Scala 2.10. To build Spark and its example programs, run:
21+
Spark is built using [Apache Maven](http://maven.apache.org/).
22+
To build Spark and its example programs, run:
2223

23-
./sbt/sbt assembly
24+
mvn -DskipTests clean package
2425

2526
(You do not need to do this if you downloaded a pre-built package.)
27+
More detailed documentation is available from the project site, at
28+
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
2629

2730
## Interactive Scala Shell
2831

@@ -71,73 +74,24 @@ can be run using:
7174

7275
./dev/run-tests
7376

77+
Please see the guidance on how to
78+
[run all automated tests](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting).
79+
7480
## A Note About Hadoop Versions
7581

7682
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
7783
storage systems. Because the protocols have changed in different versions of
7884
Hadoop, you must build Spark against the same version that your cluster runs.
79-
You can change the version by setting `-Dhadoop.version` when building Spark.
80-
81-
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop
82-
versions without YARN, use:
83-
84-
# Apache Hadoop 1.2.1
85-
$ sbt/sbt -Dhadoop.version=1.2.1 assembly
86-
87-
# Cloudera CDH 4.2.0 with MapReduce v1
88-
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly
89-
90-
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions
91-
with YARN, also set `-Pyarn`:
92-
93-
# Apache Hadoop 2.0.5-alpha
94-
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly
95-
96-
# Cloudera CDH 4.2.0 with MapReduce v2
97-
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly
98-
99-
# Apache Hadoop 2.2.X and newer
100-
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly
101-
102-
When developing a Spark application, specify the Hadoop version by adding the
103-
"hadoop-client" artifact to your project's dependencies. For example, if you're
104-
using Hadoop 1.2.1 and build your application using SBT, add this entry to
105-
`libraryDependencies`:
106-
107-
"org.apache.hadoop" % "hadoop-client" % "1.2.1"
10885

109-
If your project is built with Maven, add this to your POM file's `<dependencies>` section:
110-
111-
<dependency>
112-
<groupId>org.apache.hadoop</groupId>
113-
<artifactId>hadoop-client</artifactId>
114-
<version>1.2.1</version>
115-
</dependency>
116-
117-
118-
## A Note About Thrift JDBC server and CLI for Spark SQL
119-
120-
Spark SQL supports Thrift JDBC server and CLI.
121-
See sql-programming-guide.md for more information about using the JDBC server and CLI.
122-
You can use those features by setting `-Phive` when building Spark as follows.
123-
124-
$ sbt/sbt -Phive assembly
86+
Please refer to the build documentation at
87+
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
88+
for detailed guidance on building for a particular distribution of Hadoop, including
89+
building for particular Hive and Hive Thriftserver distributions. See also
90+
["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html)
91+
for guidance on building a Spark application that works with a particular
92+
distribution.
12593

12694
## Configuration
12795

12896
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
12997
in the online documentation for an overview on how to configure Spark.
130-
131-
132-
## Contributing to Spark
133-
134-
Contributions via GitHub pull requests are gladly accepted from their original
135-
author. Along with any pull requests, please state that the contribution is
136-
your original work and that you license the work to the project under the
137-
project's open source license. Whether or not you state this explicitly, by
138-
submitting any copyrighted material via pull request, email, or other means
139-
you agree to license the material under the project's open source license and
140-
warrant that you have the legal authority to do so.
141-
142-
Please see [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
143-
for more information.

core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -776,7 +776,7 @@ private[spark] object PythonRDD extends Logging {
776776
}
777777

778778
/**
779-
* Convert and RDD of Java objects to and RDD of serialized Python objects, that is usable by
779+
* Convert an RDD of Java objects to an RDD of serialized Python objects, that is usable by
780780
* PySpark.
781781
*/
782782
def javaToPython(jRDD: JavaRDD[Any]): JavaRDD[Array[Byte]] = {

core/src/main/scala/org/apache/spark/network/ManagedBuffer.scala

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package org.apache.spark.network
1919

2020
import java.io.{FileInputStream, RandomAccessFile, File, InputStream}
2121
import java.nio.ByteBuffer
22+
import java.nio.channels.FileChannel
2223
import java.nio.channels.FileChannel.MapMode
2324

2425
import com.google.common.io.ByteStreams
@@ -66,8 +67,15 @@ final class FileSegmentManagedBuffer(val file: File, val offset: Long, val lengt
6667
override def size: Long = length
6768

6869
override def nioByteBuffer(): ByteBuffer = {
69-
val channel = new RandomAccessFile(file, "r").getChannel
70-
channel.map(MapMode.READ_ONLY, offset, length)
70+
var channel: FileChannel = null
71+
try {
72+
channel = new RandomAccessFile(file, "r").getChannel
73+
channel.map(MapMode.READ_ONLY, offset, length)
74+
} finally {
75+
if (channel != null) {
76+
channel.close()
77+
}
78+
}
7179
}
7280

7381
override def inputStream(): InputStream = {

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ import scala.collection.mutable.ArrayBuffer
2323
import scala.collection.mutable.HashSet
2424
import scala.collection.mutable.Queue
2525

26-
import org.apache.spark.{TaskContext, Logging, SparkException}
26+
import org.apache.spark.{TaskContext, Logging}
2727
import org.apache.spark.network.{ManagedBuffer, BlockFetchingListener, BlockTransferService}
2828
import org.apache.spark.serializer.Serializer
2929
import org.apache.spark.util.Utils

core/src/main/scala/org/apache/spark/util/Utils.scala

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -530,7 +530,12 @@ private[spark] object Utils extends Logging {
530530
if (address.isLoopbackAddress) {
531531
// Address resolves to something like 127.0.1.1, which happens on Debian; try to find
532532
// a better address using the local network interfaces
533-
for (ni <- NetworkInterface.getNetworkInterfaces) {
533+
// getNetworkInterfaces returns ifs in reverse order compared to ifconfig output order
534+
// on unix-like system. On windows, it returns in index order.
535+
// It's more proper to pick ip address following system output order.
536+
val activeNetworkIFs = NetworkInterface.getNetworkInterfaces.toList
537+
val reOrderedNetworkIFs = if (isWindows) activeNetworkIFs else activeNetworkIFs.reverse
538+
for (ni <- reOrderedNetworkIFs) {
534539
for (addr <- ni.getInetAddresses if !addr.isLinkLocalAddress &&
535540
!addr.isLoopbackAddress && addr.isInstanceOf[Inet4Address]) {
536541
// We've found an address that looks reasonable!

core/src/test/scala/org/apache/spark/ui/UISuite.scala

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ import javax.servlet.http.HttpServletRequest
2323
import scala.io.Source
2424
import scala.util.{Failure, Success, Try}
2525

26-
import org.eclipse.jetty.server.Server
2726
import org.eclipse.jetty.servlet.ServletContextHandler
2827
import org.scalatest.FunSuite
2928
import org.scalatest.concurrent.Eventually._
@@ -108,14 +107,8 @@ class UISuite extends FunSuite {
108107
}
109108

110109
test("jetty selects different port under contention") {
111-
val startPort = 4040
112-
val server = new Server(startPort)
113-
114-
Try { server.start() } match {
115-
case Success(s) =>
116-
case Failure(e) =>
117-
// Either case server port is busy hence setup for test complete
118-
}
110+
val server = new ServerSocket(0)
111+
val startPort = server.getLocalPort
119112
val serverInfo1 = JettyUtils.startJettyServer(
120113
"0.0.0.0", startPort, Seq[ServletContextHandler](), new SparkConf)
121114
val serverInfo2 = JettyUtils.startJettyServer(
@@ -126,6 +119,9 @@ class UISuite extends FunSuite {
126119
assert(boundPort1 != startPort)
127120
assert(boundPort2 != startPort)
128121
assert(boundPort1 != boundPort2)
122+
serverInfo1.server.stop()
123+
serverInfo2.server.stop()
124+
server.close()
129125
}
130126

131127
test("jetty binds to port 0 correctly") {

dev/mima

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,19 @@ FWDIR="$(cd "`dirname "$0"`"/..; pwd)"
2525
cd "$FWDIR"
2626

2727
echo -e "q\n" | sbt/sbt oldDeps/update
28+
rm -f .generated-mima*
29+
30+
# Generate Mima Ignore is called twice, first with latest built jars
31+
# on the classpath and then again with previous version jars on the classpath.
32+
# Because of a bug in GenerateMIMAIgnore that when old jars are ahead on classpath
33+
# it did not process the new classes (which are in assembly jar).
34+
./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore
2835

2936
export SPARK_CLASSPATH="`find lib_managed \( -name '*spark*jar' -a -type f \) | tr "\\n" ":"`"
3037
echo "SPARK_CLASSPATH=$SPARK_CLASSPATH"
3138

3239
./bin/spark-class org.apache.spark.tools.GenerateMIMAIgnore
40+
3341
echo -e "q\n" | sbt/sbt mima-report-binary-issues | grep -v -e "info.*Resolving"
3442
ret_val=$?
3543

0 commit comments

Comments
 (0)