Skip to content

Fix org.scala-lang: * inconsistent versions #234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 160 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
160 commits
Select commit Hold shift + click to select a range
841721e
SPARK-1352: Improve robustness of spark-submit script
pwendell Mar 31, 2014
5731af5
[SQL] Rewrite join implementation to allow streaming of one relation.
marmbrus Mar 31, 2014
33b3c2a
SPARK-1365 [HOTFIX] Fix RateLimitedOutputStream test
pwendell Mar 31, 2014
564f1c1
SPARK-1376. In the yarn-cluster submitter, rename "args" option to "arg"
sryza Apr 1, 2014
94fe7fd
[SPARK-1377] Upgrade Jetty to 8.1.14v20131031
andrewor14 Apr 1, 2014
ada310a
[Hot Fix #42] Persisted RDD disappears on storage page if re-used
andrewor14 Apr 1, 2014
f5c418d
[SQL] SPARK-1372 Support for caching and uncaching tables in a SQLCon…
marmbrus Apr 1, 2014
764353d
[SPARK-1342] Scala 2.10.4
markhamstra Apr 2, 2014
afb5ea6
[Spark-1134] only call ipython if no arguments are given; remove IPYT…
Apr 2, 2014
45df912
Revert "[Spark-1134] only call ipython if no arguments are given; rem…
mateiz Apr 2, 2014
8b3045c
MLI-1 Decision Trees
manishamde Apr 2, 2014
ea9de65
Remove * from test case golden filename.
marmbrus Apr 2, 2014
11973a7
Renamed stageIdToActiveJob to jobIdToActiveJob.
kayousterhout Apr 2, 2014
de8eefa
[SPARK-1385] Use existing code for JSON de/serialization of BlockId
andrewor14 Apr 2, 2014
7823633
Do not re-use objects in the EdgePartition/EdgeTriplet iterators.
darabos Apr 2, 2014
1faa579
[SPARK-1371][WIP] Compression support for Spark SQL in-memory columna…
liancheng Apr 2, 2014
ed730c9
StopAfter / TopK related changes
rxin Apr 2, 2014
9c65fa7
[SPARK-1212, Part II] Support sparse data in MLlib
mengxr Apr 2, 2014
47ebea5
[SQL] SPARK-1364 Improve datatype and test coverage for ScalaReflecti…
marmbrus Apr 3, 2014
92a86b2
[SPARK-1398] Removed findbugs jsr305 dependency
markhamstra Apr 3, 2014
fbebaed
Spark parquet improvements
AndreSchumacher Apr 3, 2014
5d1feda
[SPARK-1360] Add Timestamp Support for SQL
chenghao-intel Apr 3, 2014
c1ea3af
Spark 1162 Implemented takeOrdered in pyspark.
ScrapCodes Apr 3, 2014
b8f5341
[SQL] SPARK-1333 First draft of java API
marmbrus Apr 3, 2014
a599e43
[SPARK-1134] Fix and document passing of arguments to IPython
Apr 3, 2014
d94826b
[BUILD FIX] Fix compilation of Spark SQL Java API.
marmbrus Apr 3, 2014
9231b01
Fix jenkins from giving the green light to builds that don't compile.
marmbrus Apr 3, 2014
33e6361
Revert "[SPARK-1398] Removed findbugs jsr305 dependency"
pwendell Apr 4, 2014
ee6e9e7
SPARK-1337: Application web UI garbage collects newest stages
pwendell Apr 4, 2014
7f32fd4
SPARK-1350. Always use JAVA_HOME to run executor container JVMs.
sryza Apr 4, 2014
01cf4c4
SPARK-1404: Always upgrade spark-env.sh vars to environment vars
aarondav Apr 4, 2014
f1fa617
[SPARK-1133] Add whole text files reader in MLlib
yinxusen Apr 4, 2014
16b8308
SPARK-1375. Additional spark-submit cleanup
sryza Apr 4, 2014
a02b535
Don't create SparkContext in JobProgressListenerSuite.
pwendell Apr 4, 2014
198892f
[SPARK-1198] Allow pipes tasks to run in different sub-directories
tgravescs Apr 5, 2014
d956cc2
[SQL] Minor fixes.
marmbrus Apr 5, 2014
60e18ce
SPARK-1414. Python API for SparkContext.wholeTextFiles
mateiz Apr 5, 2014
5f3c1bb
Add test utility for generating Jar files with compiled classes.
pwendell Apr 5, 2014
1347ebd
[SPARK-1419] Bumped parent POM to apache 14
markhamstra Apr 5, 2014
b50ddfd
SPARK-1305: Support persisting RDD's directly to Tachyon
haoyuan Apr 5, 2014
8de038e
[SQL] SPARK-1366 Consistent sql function across different types of SQ…
marmbrus Apr 5, 2014
0acc7a0
small fix ( proogram -> program )
prabeesh Apr 5, 2014
7c18428
HOTFIX for broken CI, by SPARK-1336
ScrapCodes Apr 5, 2014
2d0150c
Remove the getStageInfo() method from SparkContext.
kayousterhout Apr 5, 2014
6e88583
[SPARK-1371] fix computePreferredLocations signature to not depend on…
Apr 5, 2014
890d63b
Fix for PR #195 for Java 6
srowen Apr 6, 2014
0b85516
SPARK-1421. Make MLlib work on Python 2.6
mateiz Apr 6, 2014
7012ffa
Fix SPARK-1420 The maven build error for Spark Catalyst
witgo Apr 6, 2014
e258e50
[SPARK-1259] Make RDD locally iterable
epahomov Apr 6, 2014
856c50f
SPARK-1387. Update build plugins, avoid plugin version warning, centr…
srowen Apr 7, 2014
7ce52c4
SPARK-1349: spark-shell gets its own command history
aarondav Apr 7, 2014
4106558
SPARK-1314: Use SPARK_HIVE to determine if we include Hive in packaging
aarondav Apr 7, 2014
1440154
SPARK-1154: Clean up app folders in worker nodes
Apr 7, 2014
87d0928
SPARK-1431: Allow merging conflicting pull requests
pwendell Apr 7, 2014
accd099
[SQL] SPARK-1371 Hash Aggregation Improvements
marmbrus Apr 7, 2014
b5bae84
[SQL] SPARK-1427 Fix toString for SchemaRDD NativeCommands.
marmbrus Apr 7, 2014
a3c51c6
SPARK-1432: Make sure that all metadata fields are properly cleaned
Apr 7, 2014
83f2a2f
[sql] Rename Expression.apply to eval for better readability.
rxin Apr 7, 2014
9dd8b91
SPARK-1252. On YARN, use container-log4j.properties for executors
sryza Apr 7, 2014
2a2ca48
HOTFIX: Disable actor input stream test.
pwendell Apr 7, 2014
0307db0
SPARK-1099: Introduce local[*] mode to infer number of cores
aarondav Apr 7, 2014
14c9238
[sql] Rename execution/aggregates.scala Aggregate.scala, and added a …
rxin Apr 8, 2014
55dfd5d
Removed the default eval implementation from Expression, and added a …
rxin Apr 8, 2014
31e6fff
Added eval for Rand (without any support for user-defined seed).
rxin Apr 8, 2014
f27e56a
Change timestamp cast semantics. When cast to numeric types, return t…
rxin Apr 8, 2014
0d0493f
[SPARK-1402] Added 3 more compression schemes
liancheng Apr 8, 2014
11eabbe
[SPARK-1103] Automatic garbage collection of RDD, shuffle and broadca…
tdas Apr 8, 2014
83ac9a4
[SPARK-1331] Added graceful shutdown to Spark Streaming
tdas Apr 8, 2014
6dc5f58
[SPARK-1396] Properly cleanup DAGScheduler on job cancellation.
kayousterhout Apr 8, 2014
3bc0548
Remove extra semicolon in import statement and unused import in Appli…
hsaputra Apr 8, 2014
a8d86b0
SPARK-1348 binding Master, Worker, and App Web UI to all interfaces
kanzhang Apr 8, 2014
e25b593
SPARK-1445: compute-classpath should not print error if lib_managed n…
aarondav Apr 8, 2014
fac6085
[SPARK-1397] Notify SparkListeners when stages fail or are cancelled.
kayousterhout Apr 8, 2014
12c077d
SPARK-1433: Upgrade Mesos dependency to 0.17.0
techaddict Apr 8, 2014
ce8ec54
Spark 1271: Co-Group and Group-By should pass Iterable[X]
holdenk Apr 9, 2014
b9e0c93
[SPARK-1434] [MLLIB] change labelParser from anonymous function to trait
mengxr Apr 9, 2014
fa0524f
Spark-939: allow user jars to take precedence over spark jars
holdenk Apr 9, 2014
9689b66
[SPARK-1390] Refactoring of matrices backed by RDDs
mengxr Apr 9, 2014
87bd1f9
SPARK-1093: Annotate developer and experimental API's
pwendell Apr 9, 2014
bde9cc1
[SPARK-1357] [MLLIB] Annotate developer and experimental APIs
mengxr Apr 9, 2014
eb5f2b6
SPARK-1407 drain event queue before stopping event logger
kanzhang Apr 9, 2014
0adc932
[SPARK-1357 (fix)] remove empty line after :: DeveloperApi/Experiment…
mengxr Apr 10, 2014
8ca3b2b
SPARK-729: Closures not always serialized at capture time
willb Apr 10, 2014
e55cc4b
SPARK-1446: Spark examples should not do a System.exit
techaddict Apr 10, 2014
e6d4a74
Revert "SPARK-729: Closures not always serialized at capture time"
pwendell Apr 10, 2014
a74fbbb
Fix SPARK-1413: Parquet messes up stdout and stdin when used in Spark…
witgo Apr 10, 2014
79820fe
[SPARK-1276] Add a HistoryServer to render persisted UI
andrewor14 Apr 10, 2014
3bd3129
SPARK-1428: MLlib should convert non-float64 NumPy arrays to float64 …
techaddict Apr 10, 2014
7b52b66
Revert "SPARK-1433: Upgrade Mesos dependency to 0.17.0"
pwendell Apr 10, 2014
f046662
Update tuning.md
ash211 Apr 10, 2014
930b70f
Remove Unnecessary Whitespace's
techaddict Apr 10, 2014
f99401a
[SQL] Improve column pruning in the optimizer.
marmbrus Apr 10, 2014
2c55783
SPARK-1202 - Add a "cancel" button in the UI for stages
Apr 11, 2014
5cd11d5
Set spark.executor.uri from environment variable (needed by Mesos)
ivanwick Apr 11, 2014
7b4203a
Add Spark v0.9.1 to ec2 launch script and use it as the default
harveyfeng Apr 11, 2014
44f654e
SPARK-1202: Improvements to task killing in the UI.
pwendell Apr 11, 2014
446bb34
SPARK-1417: Spark on Yarn - spark UI link from resourcemanager is broken
tgravescs Apr 11, 2014
98225a6
Some clean up in build/docs
pwendell Apr 11, 2014
f5ace8d
[SPARK-1225, 1241] [MLLIB] Add AreaUnderCurve and BinaryClassificatio…
mengxr Apr 11, 2014
6a0f8e3
HOTFIX: Ignore python metastore files in RAT checks.
pwendell Apr 11, 2014
7038b00
[FIX] make coalesce test deterministic in RDDSuite
mengxr Apr 12, 2014
fdfb45e
[WIP] [SPARK-1328] Add vector statistics
yinxusen Apr 12, 2014
aa8bb11
Update WindowedDStream.scala
baishuo Apr 12, 2014
165e06a
SPARK-1057 (alternative) Remove fastutil
srowen Apr 12, 2014
6aa08c3
[SPARK-1386] Web UI for Spark Streaming
tdas Apr 12, 2014
c2d160f
[Fix #204] Update out-dated comments
andrewor14 Apr 12, 2014
ca11919
[SPARK-1403] Move the class loader creation back to where it was in 0…
Apr 13, 2014
4bc07ee
SPARK-1480: Clean up use of classloaders
pwendell Apr 13, 2014
037fe4d
[SPARK-1415] Hadoop min split for wholeTextFiles()
yinxusen Apr 13, 2014
7dbca68
[BUGFIX] In-memory columnar storage bug fixes
liancheng Apr 14, 2014
268b535
HOTFIX: Use file name and not paths for excludes
pwendell Apr 14, 2014
0247b5c
SPARK-1488. Resolve scalac feature warnings during build
srowen Apr 15, 2014
c99bcb7
SPARK-1374: PySpark API for SparkSQL
ahirreddy Apr 15, 2014
df36091
SPARK-1426: Make MLlib work with NumPy versions older than 1.7
techaddict Apr 15, 2014
2580a3b
SPARK-1501: Ensure assertions in Graph.apply are asserted.
willb Apr 15, 2014
6843d63
[SPARK-1157][MLlib] L-BFGS Optimizer based on Breeze's implementation.
Apr 15, 2014
07d72fe
Decision Tree documentation for MLlib programming guide
manishamde Apr 15, 2014
5aaf983
SPARK-1455: Better isolation for unit tests.
pwendell Apr 16, 2014
8517911
[FIX] update sbt-idea to version 1.6.0
mengxr Apr 16, 2014
63ca581
[WIP] SPARK-1430: Support sparse data in Python MLlib
mateiz Apr 16, 2014
273c2fd
[SQL] SPARK-1424 Generalize insertIntoTable functions on SchemaRDDs
marmbrus Apr 16, 2014
6a10d80
[SPARK-959] Updated SBT from 0.13.1 to 0.13.2
liancheng Apr 16, 2014
c0273d8
Make "spark logo" link refer to "/".
Apr 16, 2014
fec462c
Loads test tables when running "sbt hive/console" without HIVE_DEV_HOME
liancheng Apr 16, 2014
9edd887
update spark.default.parallelism
CrazyJvm Apr 16, 2014
c3527a3
SPARK-1310: Start adding k-fold cross validation to MLLib [adds kFold…
holdenk Apr 16, 2014
77f8367
SPARK-1497. Fix scalastyle warnings in YARN, Hive code
srowen Apr 16, 2014
82349fb
Minor addition to SPARK-1497
pwendell Apr 16, 2014
e269c24
SPARK-1469: Scheduler mode should accept lower-case definitions and h…
techaddict Apr 16, 2014
725925c
SPARK-1465: Spark compilation is broken with the latest hadoop-2.4.0 …
Apr 16, 2014
10b1c59
[SPARK-1511] use Files.move instead of renameTo in TestUtils.scala
advancedxy Apr 16, 2014
987760e
Add clean to build
pwendell Apr 16, 2014
235a47c
Rebuild routing table after Graph.reverse
ankurdave Apr 17, 2014
17d3234
SPARK-1329: Create pid2vid with correct number of partitions
ankurdave Apr 17, 2014
016a877
remove unnecessary brace and semicolon in 'putBlockInfo.synchronize' …
CrazyJvm Apr 17, 2014
38877cc
Fixing a race condition in event listener unit test
kanzhang Apr 17, 2014
9c40b9e
misleading task number of groupByKey
CrazyJvm Apr 17, 2014
07b7ad3
Update ReducedWindowedDStream.scala
baishuo Apr 17, 2014
d4916a8
Include stack trace for exceptions thrown by user code.
marmbrus Apr 17, 2014
6ad4c54
SPARK-1462: Examples of ML algorithms are using deprecated APIs
techaddict Apr 17, 2014
bb76eae
[python alternative] pyspark require Python2, failing if system defau…
abhishekkr Apr 17, 2014
6904750
[SPARK-1395] Allow "local:" URIs to work on Yarn.
Apr 17, 2014
0058b5d
SPARK-1408 Modify Spark on Yarn to point to the history server when a…
tgravescs Apr 17, 2014
6c746ba
FIX: Don't build Hive in assembly unless running Hive tests.
pwendell Apr 18, 2014
7863ecc
HOTFIX: Ignore streaming UI test
pwendell Apr 18, 2014
e31c8ff
SPARK-1483: Rename minSplits to minPartitions in public APIs
CodingCat Apr 18, 2014
89f4743
Reuses Row object in ExistingRdd.productToRowRdd()
liancheng Apr 18, 2014
aa17f02
[SPARK-1520] remove fastutil from dependencies
mengxr Apr 18, 2014
8aa1f4c
SPARK-1357 (addendum). More Experimental items in MLlib
srowen Apr 18, 2014
3c7a9ba
SPARK-1523: improve the readability of code in AkkaUtil
CodingCat Apr 18, 2014
81a152c
Fixed broken pyspark shell.
rxin Apr 18, 2014
c399baa
SPARK-1456 Remove view bounds on Ordered in favor of a context bound …
marmbrus Apr 18, 2014
2089e0e
SPARK-1482: Fix potential resource leaks in saveAsHadoopDataset and s…
zsxwing Apr 19, 2014
28238c8
README update
rxin Apr 19, 2014
5d0f58b
Use scala deprecation instead of java.
marmbrus Apr 19, 2014
10d0421
Add insertInto and saveAsTable to Python API.
marmbrus Apr 19, 2014
25fc318
[SPARK-1535] ALS: Avoid the garbage-creating ctor of DoubleMatrix
tmyklebu Apr 19, 2014
3a390bf
REPL cleanup.
marmbrus Apr 20, 2014
42238b6
Fix org.scala-lang: * inconsistent versions for maven
witgo Apr 21, 2014
b434ec0
remove exclusion scalap
witgo Apr 21, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,6 @@ work
.*\.q
golden
test.out/*
.*iml
service.properties
db.lck
35 changes: 24 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,33 @@ guide, on the project webpage at <http://spark.apache.org/documentation.html>.
This README file only contains basic setup instructions.


## Building
## Building Spark

Spark requires Scala 2.10. The project is built using Simple Build Tool (SBT),
which can be obtained [here](http://www.scala-sbt.org). If SBT is installed we
will use the system version of sbt otherwise we will attempt to download it
automatically. To build Spark and its example programs, run:
Spark is built on Scala 2.10. To build Spark and its example programs, run:

./sbt/sbt assembly

Once you've built Spark, the easiest way to start using it is the shell:
## Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

./bin/spark-shell

Or, for the Python API, the Python shell (`./bin/pyspark`).
Try the following command, which should return 1000:

scala> sc.parallelize(1 to 1000).count()

## Interactive Python Shell

Alternatively, if you prefer Python, you can use the Python shell:

./bin/pyspark

And run the following command, which should also return 1000:

>>> sc.parallelize(range(1000)).count()

## Example Programs

Spark also comes with several sample programs in the `examples` directory.
To run one of them, use `./bin/run-example <class> <params>`. For example:
Expand All @@ -38,13 +51,13 @@ All of the Spark samples take a `<master>` parameter that is the cluster URL
to connect to. This can be a mesos:// or spark:// URL, or "local" to run
locally with one thread, or "local[N]" to run locally with N threads.

## Running tests
## Running Tests

Testing first requires [Building](#building) Spark. Once Spark is built, tests
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
can be run using:

`./sbt/sbt test`
./sbt/sbt test

## A Note About Hadoop Versions

Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
Expand Down
12 changes: 11 additions & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,16 @@
</dependency>
</dependencies>
</profile>
<profile>
<id>hive</id>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>spark-ganglia-lgpl</id>
<dependencies>
Expand Down Expand Up @@ -208,7 +218,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>buildnumber-maven-plugin</artifactId>
<version>1.1</version>
<version>1.2</version>
<executions>
<execution>
<phase>validate</phase>
Expand Down
20 changes: 12 additions & 8 deletions bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala
Original file line number Diff line number Diff line change
Expand Up @@ -220,27 +220,31 @@ object Bagel extends Logging {
*/
private def comp[K: Manifest, V <: Vertex, M <: Message[K], C](
sc: SparkContext,
grouped: RDD[(K, (Seq[C], Seq[V]))],
grouped: RDD[(K, (Iterable[C], Iterable[V]))],
compute: (V, Option[C]) => (V, Array[M]),
storageLevel: StorageLevel
): (RDD[(K, (V, Array[M]))], Int, Int) = {
var numMsgs = sc.accumulator(0)
var numActiveVerts = sc.accumulator(0)
val processed = grouped.flatMapValues {
case (_, vs) if vs.size == 0 => None
case (c, vs) =>
val processed = grouped.mapValues(x => (x._1.iterator, x._2.iterator))
.flatMapValues {
case (_, vs) if !vs.hasNext => None
case (c, vs) => {
val (newVert, newMsgs) =
compute(vs(0), c match {
case Seq(comb) => Some(comb)
case Seq() => None
})
compute(vs.next,
c.hasNext match {
case true => Some(c.next)
case false => None
}
)

numMsgs += newMsgs.size
if (newVert.active) {
numActiveVerts += 1
}

Some((newVert, newMsgs))
}
}.persist(storageLevel)

// Force evaluation of processed RDD for accurate performance measurements
Expand Down
6 changes: 4 additions & 2 deletions bagel/src/test/scala/org/apache/spark/bagel/BagelSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,15 @@ import org.scalatest.time.SpanSugar._
import org.apache.spark._
import org.apache.spark.storage.StorageLevel

import scala.language.postfixOps

class TestVertex(val active: Boolean, val age: Int) extends Vertex with Serializable
class TestMessage(val targetId: String) extends Message[String] with Serializable

class BagelSuite extends FunSuite with Assertions with BeforeAndAfter with Timeouts {

var sc: SparkContext = _

after {
if (sc != null) {
sc.stop()
Expand Down
35 changes: 19 additions & 16 deletions bin/compute-classpath.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,21 +30,7 @@ FWDIR="$(cd `dirname $0`/..; pwd)"
# Build up classpath
CLASSPATH="$SPARK_CLASSPATH:$FWDIR/conf"

# Support for interacting with Hive. Since hive pulls in a lot of dependencies that might break
# existing Spark applications, it is not included in the standard spark assembly. Instead, we only
# include it in the classpath if the user has explicitly requested it by running "sbt hive/assembly"
# Hopefully we will find a way to avoid uber-jars entirely and deploy only the needed packages in
# the future.
if [ -f "$FWDIR"/sql/hive/target/scala-$SCALA_VERSION/spark-hive-assembly-*.jar ]; then

# Datanucleus jars do not work if only included in the uberjar as plugin.xml metadata is lost.
DATANUCLEUSJARS=$(JARS=("$FWDIR/lib_managed/jars"/datanucleus-*.jar); IFS=:; echo "${JARS[*]}")
CLASSPATH=$CLASSPATH:$DATANUCLEUSJARS

ASSEMBLY_DIR="$FWDIR/sql/hive/target/scala-$SCALA_VERSION/"
else
ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SCALA_VERSION/"
fi
ASSEMBLY_DIR="$FWDIR/assembly/target/scala-$SCALA_VERSION"

# First check if we have a dependencies jar. If so, include binary classes with the deps jar
if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then
Expand All @@ -59,7 +45,7 @@ if [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar ]; then
CLASSPATH="$CLASSPATH:$FWDIR/sql/core/target/scala-$SCALA_VERSION/classes"
CLASSPATH="$CLASSPATH:$FWDIR/sql/hive/target/scala-$SCALA_VERSION/classes"

DEPS_ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark*-assembly*hadoop*-deps.jar`
DEPS_ASSEMBLY_JAR=`ls "$ASSEMBLY_DIR"/spark-assembly*hadoop*-deps.jar`
CLASSPATH="$CLASSPATH:$DEPS_ASSEMBLY_JAR"
else
# Else use spark-assembly jar from either RELEASE or assembly directory
Expand All @@ -71,6 +57,23 @@ else
CLASSPATH="$CLASSPATH:$ASSEMBLY_JAR"
fi

# When Hive support is needed, Datanucleus jars must be included on the classpath.
# Datanucleus jars do not work if only included in the uber jar as plugin.xml metadata is lost.
# Both sbt and maven will populate "lib_managed/jars/" with the datanucleus jars when Spark is
# built with Hive, so first check if the datanucleus jars exist, and then ensure the current Spark
# assembly is built for Hive, before actually populating the CLASSPATH with the jars.
# Note that this check order is faster (by up to half a second) in the case where Hive is not used.
num_datanucleus_jars=$(ls "$FWDIR"/lib_managed/jars/ 2>/dev/null | grep "datanucleus-.*\\.jar" | wc -l)
if [ $num_datanucleus_jars -gt 0 ]; then
AN_ASSEMBLY_JAR=${ASSEMBLY_JAR:-$DEPS_ASSEMBLY_JAR}
num_hive_files=$(jar tvf "$AN_ASSEMBLY_JAR" org/apache/hadoop/hive/ql/exec 2>/dev/null | wc -l)
if [ $num_hive_files -gt 0 ]; then
echo "Spark assembly has been built with Hive, including Datanucleus jars on classpath" 1>&2
DATANUCLEUSJARS=$(echo "$FWDIR/lib_managed/jars"/datanucleus-*.jar | tr " " :)
CLASSPATH=$CLASSPATH:$DATANUCLEUSJARS
fi
fi

# Add test classes if we're running from SBT or Maven with SPARK_TESTING set to 1
if [[ $SPARK_TESTING == 1 ]]; then
CLASSPATH="$CLASSPATH:$FWDIR/core/target/scala-$SCALA_VERSION/test-classes"
Expand Down
3 changes: 3 additions & 0 deletions bin/load-spark-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ if [ -z "$SPARK_ENV_LOADED" ]; then
use_conf_dir=${SPARK_CONF_DIR:-"$parent_dir/conf"}

if [ -f "${use_conf_dir}/spark-env.sh" ]; then
# Promote all variable declarations to environment (exported) variables
set -a
. "${use_conf_dir}/spark-env.sh"
set +a
fi
fi
3 changes: 2 additions & 1 deletion bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,8 @@ if [ -n "$IPYTHON_OPTS" ]; then
IPYTHON=1
fi

if [[ "$IPYTHON" = "1" ]] ; then
# Only use ipython if no command line arguments were provided [SPARK-1134]
if [[ "$IPYTHON" = "1" && $# = 0 ]] ; then
exec ipython $IPYTHON_OPTS
else
exec "$PYSPARK_PYTHON" "$@"
Expand Down
10 changes: 6 additions & 4 deletions bin/spark-class
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ DEFAULT_MEM=${SPARK_MEM:-512m}

SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.akka.logLifecycleEvents=true"

# Add java opts and memory settings for master, worker, executors, and repl.
# Add java opts and memory settings for master, worker, history server, executors, and repl.
case "$1" in
# Master and Worker use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
# Master, Worker, and HistoryServer use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
'org.apache.spark.deploy.master.Master')
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_MASTER_OPTS"
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
Expand All @@ -58,6 +58,10 @@ case "$1" in
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_WORKER_OPTS"
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
;;
'org.apache.spark.deploy.history.HistoryServer')
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_HISTORY_OPTS"
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
;;

# Executors use SPARK_JAVA_OPTS + SPARK_EXECUTOR_MEMORY.
'org.apache.spark.executor.CoarseGrainedExecutorBackend')
Expand Down Expand Up @@ -154,5 +158,3 @@ if [ "$SPARK_PRINT_LAUNCH_COMMAND" == "1" ]; then
fi

exec "$RUNNER" -cp "$CLASSPATH" $JAVA_OPTS "$@"


7 changes: 5 additions & 2 deletions bin/spark-class2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,17 @@ if "x%OUR_JAVA_MEM%"=="x" set OUR_JAVA_MEM=512m

set SPARK_DAEMON_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS% -Dspark.akka.logLifecycleEvents=true

rem Add java opts and memory settings for master, worker, executors, and repl.
rem Master and Worker use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
rem Add java opts and memory settings for master, worker, history server, executors, and repl.
rem Master, Worker and HistoryServer use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
if "%1"=="org.apache.spark.deploy.master.Master" (
set OUR_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS% %SPARK_MASTER_OPTS%
if not "x%SPARK_DAEMON_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_DAEMON_MEMORY%
) else if "%1"=="org.apache.spark.deploy.worker.Worker" (
set OUR_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS% %SPARK_WORKER_OPTS%
if not "x%SPARK_DAEMON_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_DAEMON_MEMORY%
) else if "%1"=="org.apache.spark.deploy.history.HistoryServer" (
set OUR_JAVA_OPTS=%SPARK_DAEMON_JAVA_OPTS% %SPARK_HISTORY_OPTS%
if not "x%SPARK_DAEMON_MEMORY%"=="x" set OUR_JAVA_MEM=%SPARK_DAEMON_MEMORY%

rem Executors use SPARK_JAVA_OPTS + SPARK_EXECUTOR_MEMORY.
) else if "%1"=="org.apache.spark.executor.CoarseGrainedExecutorBackend" (
Expand Down
8 changes: 4 additions & 4 deletions bin/spark-shell
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ set -o posix
FWDIR="$(cd `dirname $0`/..; pwd)"

SPARK_REPL_OPTS="${SPARK_REPL_OPTS:-""}"
DEFAULT_MASTER="local"
DEFAULT_MASTER="local[*]"
MASTER=${MASTER:-""}

info_log=0
Expand Down Expand Up @@ -64,7 +64,7 @@ ${txtbld}OPTIONS${txtrst}:
is followed by m for megabytes or g for gigabytes, e.g. "1g".
-dm --driver-memory : The memory used by the Spark Shell, the number is followed
by m for megabytes or g for gigabytes, e.g. "1g".
-m --master : A full string that describes the Spark Master, defaults to "local"
-m --master : A full string that describes the Spark Master, defaults to "local[*]"
e.g. "spark://localhost:7077".
--log-conf : Enables logging of the supplied SparkConf as INFO at start of the
Spark Context.
Expand Down Expand Up @@ -127,7 +127,7 @@ function set_spark_log_conf(){

function set_spark_master(){
if ! [[ "$1" =~ $ARG_FLAG_PATTERN ]]; then
MASTER="$1"
export MASTER="$1"
else
out_error "wrong format for $2"
fi
Expand All @@ -145,7 +145,7 @@ function resolve_spark_master(){
fi

if [ -z "$MASTER" ]; then
MASTER="$DEFAULT_MASTER"
export MASTER="$DEFAULT_MASTER"
fi

}
Expand Down
Loading