Skip to content

[SPARK-2873] [SQL] using ExternalAppendOnlyMap to resolve OOM when aggregating #1822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 157 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
157 commits
Select commit Hold shift + click to select a range
749b632
SITUATION: ShuffledDStream run tasks whether dstream has partition it…
Jul 3, 2014
e1f9978
DStream run tasks only when dstream has partition items
Jul 10, 2014
b03ad14
DStream run tasks only when dstream has partition items
Jul 10, 2014
290b1a1
DStream run tasks only when dstream has partition items
Jul 11, 2014
87627e7
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
f889700
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
21b5735
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
d2be832
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
e3a88b1
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
2a4786a
Merge branch 'sql-memory-patch' of https://github.com/guowei2/spark i…
Aug 6, 2014
475da9d
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
ffd1f59
[SPARK-2887] fix bug of countApproxDistinct() when have more than one…
davies Aug 7, 2014
47ccd5e
[SPARK-2851] [mllib] DecisionTree Python consistency update
jkbradley Aug 7, 2014
75993a6
SPARK-2879 part 2 [BUILD] Use HTTPS to access Maven Central and other…
srowen Aug 7, 2014
8d1dec4
[mllib] DecisionTree Strategy parameter checks
jkbradley Aug 7, 2014
b9e9e53
[SPARK-2852][MLLIB] Separate model from IDF/StandardScaler algorithms
mengxr Aug 7, 2014
80ec5ba
SPARK-2905 Fixed path sbin => bin
dosoft Aug 7, 2014
32096c2
SPARK-2899 Doc generation is back to working in new SBT Build.
ScrapCodes Aug 7, 2014
6906b69
SPARK-2787: Make sort-based shuffle write files directly when there's…
mateiz Aug 8, 2014
4c51098
SPARK-2565. Update ShuffleReadMetrics as blocks are fetched
sryza Aug 8, 2014
9de6a42
[SPARK-2904] Remove non-used local variable in SparkSubmitArguments
sarutak Aug 8, 2014
9a54de1
[SPARK-2911]: provide rdd.parent[T](j) to obtain jth parent RDD
erikerlandson Aug 8, 2014
9016af3
[SPARK-2888] [SQL] Fix addColumnMetadataToConf in HiveTableScan
yhuai Aug 8, 2014
0489cee
[SPARK-2908] [SQL] JsonRDD.nullTypeToStringType does not convert all …
yhuai Aug 8, 2014
c874723
[SPARK-2877] [SQL] MetastoreRelation should use SparkClassLoader when…
yhuai Aug 8, 2014
45d8f4d
[SPARK-2919] [SQL] Basic support for analyze command in HiveQl
yhuai Aug 8, 2014
b7c89a7
[SPARK-2700] [SQL] Hidden files (such as .impala_insert_staging) shou…
chutium Aug 8, 2014
74d6f62
[SPARK-1997][MLLIB] update breeze to 0.9
mengxr Aug 8, 2014
ec79063
[SPARK-2897][SPARK-2920]TorrentBroadcast does use the serializer clas…
witgo Aug 8, 2014
1c84dba
[Web UI]Make decision order of Worker's WebUI port consistent with Ma…
WangTaoTheTonic Aug 9, 2014
43af281
[SPARK-2911] apply parent[T](j) to clarify UnionRDD code
erikerlandson Aug 9, 2014
28dbae8
[SPARK-2635] Fix race condition at SchedulerBackend.isReady in standa…
li-zhihui Aug 9, 2014
b431e67
[SPARK-2861] Fix Doc comment of histogram method
Aug 9, 2014
e45daf2
[SPARK-1766] sorted functions to meet pedantic requirements
Aug 10, 2014
4f4a988
[SPARK-2894] spark-shell doesn't accept flags
sarutak Aug 10, 2014
5b6585d
Updated Spark SQL README to include the hive-thriftserver module
rxin Aug 10, 2014
482c5af
Turn UpdateBlockInfo into case class.
rxin Aug 10, 2014
3570119
Remove extra semicolon in Task.scala
witgo Aug 10, 2014
1d03a26
[SPARK-2950] Add gc time and shuffle write time to JobLogger
shivaram Aug 10, 2014
28dcbb5
[SPARK-2898] [PySpark] fix bugs in deamon.py
davies Aug 10, 2014
b715aa0
[SPARK-2937] Separate out samplyByKeyExact as its own API in PairRDDF…
dorx Aug 10, 2014
ba28a8f
[SPARK-2936] Migrate Netty network module from Java to Scala
rxin Aug 11, 2014
db06a81
[PySpark] [SPARK-2954] [SPARK-2948] [SPARK-2910] [SPARK-2101] Python …
JoshRosen Aug 11, 2014
3733866
[SPARK-2952] Enable logging actor messages at DEBUG level
rxin Aug 11, 2014
7712e72
[SPARK-2931] In TaskSetManager, reset currentLocalityIndex after reco…
JoshRosen Aug 12, 2014
32638b5
[SPARK-2515][mllib] Chi Squared test
dorx Aug 12, 2014
6fab941
[SPARK-2934][MLlib] Adding LogisticRegressionWithLBFGS Interface
Aug 12, 2014
490ecfa
[SPARK-2844][SQL] Correctly set JVM HiveContext if it is passed into …
ahirreddy Aug 12, 2014
21a95ef
[SPARK-2590][SQL] Added option to handle incremental collection, disa…
liancheng Aug 12, 2014
e83fdcd
[sql]use SparkSQLEnv.stop() in ShutdownHook
scwf Aug 12, 2014
647aeba
[SQL] A tiny refactoring in HiveContext#analyze
yhuai Aug 12, 2014
c9c89c3
[SPARK-2965][SQL] Fix HashOuterJoin output nullabilities.
ueshin Aug 12, 2014
c686b7d
[SPARK-2968][SQL] Fix nullabilities of Explode.
ueshin Aug 12, 2014
bad21ed
[SPARK-2650][SQL] Build column buffers in smaller batches
marmbrus Aug 12, 2014
5d54d71
[SQL] [SPARK-2826] Reduce the memory copy while building the hashmap …
chenghao-intel Aug 12, 2014
9038d94
[SPARK-2923][MLLIB] Implement some basic BLAS routines
mengxr Aug 12, 2014
f0060b7
[MLlib] Correctly set vectorSize and alpha
Ishiihara Aug 12, 2014
882da57
fix flaky tests
davies Aug 12, 2014
c235b83
SPARK-2830 [MLlib]: re-organize mllib documentation
atalwalkar Aug 13, 2014
3967215
Choose OnHeapAggregation or ExternalAggregation base on spark.sql.agg…
guowei2 Aug 13, 2014
34abbab
Choose OnHeapAggregation or ExternalAggregation base on spark.sql.agg…
guowei2 Aug 13, 2014
676f982
[SPARK-2953] Allow using short names for io compression codecs
rxin Aug 13, 2014
246cb3f
Use transferTo when copy merge files in ExternalSorter
colorant Aug 13, 2014
2bd8126
[SPARK-1777 (partial)] bugfix: make size of requested memory correctly
liyezhang556520 Aug 13, 2014
fe47359
[SPARK-2993] [MLLib] colStats (wrapper around MultivariateStatistical…
dorx Aug 13, 2014
869f06c
[SPARK-2963] [SQL] There no documentation about building to use HiveS…
sarutak Aug 13, 2014
c974a71
[SPARK-3013] [SQL] [PySpark] convert array into list
davies Aug 13, 2014
434bea1
[SPARK-2983] [PySpark] improve performance of sortByKey()
davies Aug 13, 2014
7ecb867
[MLLIB] use Iterator.fill instead of Array.fill
mengxr Aug 13, 2014
bdc7a1a
[SPARK-3004][SQL] Added null checking when retrieving row set
liancheng Aug 13, 2014
13f54e2
[SPARK-2817] [SQL] add "show create table" support
tianyi Aug 13, 2014
9256d4a
[SPARK-2994][SQL] Support for udfs that take complex types
marmbrus Aug 14, 2014
376a82e
[SPARK-2650][SQL] More precise initial buffer size estimation for in-…
liancheng Aug 14, 2014
9fde1ff
[SPARK-2935][SQL]Fix parquet predicate push down bug
marmbrus Aug 14, 2014
905dc4b
[SPARK-2970] [SQL] spark-sql script ends with IOException when EventL…
sarutak Aug 14, 2014
63d6777
[SPARK-2986] [SQL] fixed: setting properties does not effect
Aug 14, 2014
0c7b452
SPARK-3020: Print completed indices rather than tasks in web UI
pwendell Aug 14, 2014
9497b12
[SPARK-3006] Failed to execute spark-shell in Windows OS
tsudukim Aug 14, 2014
e424565
[Docs] Add missing <code> tags (minor)
andrewor14 Aug 14, 2014
69a57a1
[SPARK-2995][MLLIB] add ALS.setIntermediateRDDStorageLevel
mengxr Aug 14, 2014
d069c5d
[SPARK-3029] Disable local execution of Spark jobs by default
aarondav Aug 14, 2014
6b8de0e
SPARK-2893: Do not swallow Exceptions when running a custom kryo regi…
GrahamDennis Aug 14, 2014
078f3fb
[SPARK-3011][SQL] _temporary directory should be filtered out by sqlC…
josephsu Aug 14, 2014
add75d4
[SPARK-2927][SQL] Add a conf to configure if we always read Binary co…
yhuai Aug 14, 2014
fde692b
[SQL] Python JsonRDD UTF8 Encoding Fix
ahirreddy Aug 14, 2014
267fdff
[SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs whe…
scwf Aug 14, 2014
eaeb0f7
Minor cleanup of metrics.Source
rxin Aug 14, 2014
9622106
[SPARK-2979][MLlib] Improve the convergence rate by minimizing the co…
Aug 14, 2014
a7f8a4f
Revert [SPARK-3011][SQL] _temporary directory should be filtered out…
marmbrus Aug 14, 2014
a75bc7a
SPARK-3009: Reverted readObject method in ApplicationInfo so that App…
jacek-lewandowski Aug 14, 2014
fa5a08e
Make dev/mima runnable on Mac OS X.
rxin Aug 14, 2014
655699f
[SPARK-3027] TaskContext: tighten visibility and provide Java friendl…
rxin Aug 15, 2014
3a8b68b
[SPARK-2468] Netty based block server / client module
rxin Aug 15, 2014
9422a9b
[SPARK-2736] PySpark converter and example script for reading Avro files
kanzhang Aug 15, 2014
500f84e
[SPARK-2912] [Spark QA] Include commit hash in Spark QA messages
nchammas Aug 15, 2014
e1b85f3
SPARK-2955 [BUILD] Test code fails to compile with "mvn compile" with…
srowen Aug 15, 2014
fba8ec3
Add caching information to rdd.toDebugString
Aug 15, 2014
7589c39
[SPARK-2924] remove default args to overloaded methods
avati Aug 15, 2014
fd9fcd2
Revert "[SPARK-2468] Netty based block server / client module"
pwendell Aug 15, 2014
0afe5cb
SPARK-3028. sparkEventToJson should support SparkListenerExecutorMetr…
sryza Aug 15, 2014
c703229
[SPARK-3022] [SPARK-3041] [mllib] Call findBins once per level + unor…
jkbradley Aug 15, 2014
cc36487
[SPARK-3046] use executor's class loader as the default serializer cl…
rxin Aug 16, 2014
5d25c0b
[SPARK-3078][MLLIB] Make LRWithLBFGS API consistent with others
mengxr Aug 16, 2014
2e069ca
[SPARK-3001][MLLIB] Improve Spearman's correlation
mengxr Aug 16, 2014
c9da466
[SPARK-3015] Block on cleaning tasks to prevent Akka timeouts
andrewor14 Aug 16, 2014
a83c772
[SPARK-3045] Make Serializer interface Java friendly
rxin Aug 16, 2014
20fcf3d
[SPARK-2977] Ensure ShuffleManager is created before ShuffleBlockManager
JoshRosen Aug 16, 2014
b4a0592
[SQL] Using safe floating-point numbers in doctest
liancheng Aug 16, 2014
4bdfaa1
[SPARK-3076] [Jenkins] catch & report test timeouts
nchammas Aug 16, 2014
76fa0ea
[SPARK-2677] BasicBlockFetchIterator#next can wait forever
sarutak Aug 16, 2014
7e70708
[SPARK-3048][MLLIB] add LabeledPoint.parse and remove loadStreamingLa…
mengxr Aug 16, 2014
ac6411c
[SPARK-3081][MLLIB] rename RandomRDDGenerators to RandomRDDs
mengxr Aug 16, 2014
379e758
[SPARK-3035] Wrong example with SparkContext.addFile
iAmGhost Aug 16, 2014
2fc8aca
[SPARK-1065] [PySpark] improve supporting for large broadcast
davies Aug 16, 2014
bc95fe0
In the stop method of ConnectionManager to cancel the ackTimeoutMonitor
witgo Aug 17, 2014
fbad722
[SPARK-3077][MLLIB] fix some chisq-test
mengxr Aug 17, 2014
73ab7f1
[SPARK-3042] [mllib] DecisionTree Filter top-down instead of bottom-up
jkbradley Aug 17, 2014
318e28b
SPARK-2881. Upgrade snappy-java to 1.1.1.3.
pwendell Aug 18, 2014
5ecb08e
Revert "[SPARK-2970] [SQL] spark-sql script ends with IOException whe…
marmbrus Aug 18, 2014
bfa09b0
[SQL] Improve debug logging and toStrings.
marmbrus Aug 18, 2014
9924328
[SPARK-1981] updated streaming-kinesis.md
cfregly Aug 18, 2014
95470a0
[HOTFIX][STREAMING] Allow the JVM/Netty to decide which port to bind …
harishreedharan Aug 18, 2014
c77f406
[SPARK-3087][MLLIB] fix col indexing bug in chi-square and add a chec…
mengxr Aug 18, 2014
5173f3c
SPARK-2884: Create binary builds in parallel with release script.
pwendell Aug 18, 2014
df652ea
SPARK-2900. aggregate inputBytes per stage
sryza Aug 18, 2014
3c8fa50
[SPARK-3097][MLlib] Word2Vec performance improvement
Ishiihara Aug 18, 2014
eef779b
[SPARK-2842][MLlib]Word2Vec documentation
Ishiihara Aug 18, 2014
edeb46f
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
69d3372
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
31857b5
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
e3ba744
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
2385618
[SPARK-2873] use ExternalAppendOnlyMap to resolve aggregate's OOM
Aug 6, 2014
d975b83
Choose OnHeapAggregation or ExternalAggregation base on spark.sql.agg…
guowei2 Aug 13, 2014
d6ff521
Choose OnHeapAggregation or ExternalAggregation base on spark.sql.agg…
guowei2 Aug 13, 2014
b821577
Merge branch 'sql-memory-patch' of https://github.com/guowei2/spark i…
guowei2 Aug 18, 2014
9306b8c
[MLlib] Remove transform(dataset: RDD[String]) from Word2Vec public API
Ishiihara Aug 18, 2014
4df2b6c
numbers of improves
guowei2 Aug 18, 2014
f611ea9
SITUATION: ShuffledDStream run tasks whether dstream has partition it…
Jul 3, 2014
6463c19
DStream run tasks only when dstream has partition items
Jul 10, 2014
9ef744f
DStream run tasks only when dstream has partition items
Jul 10, 2014
800e230
DStream run tasks only when dstream has partition items
Jul 11, 2014
46e59c9
Merge branch 'master' of https://github.com/guowei2/spark
Aug 18, 2014
bb6c6da
SITUATION: ShuffledDStream run tasks whether dstream has partition it…
Jul 3, 2014
60fff2a
DStream run tasks only when dstream has partition items
Jul 10, 2014
492f682
Merge branch 'master' of https://github.com/guowei2/spark
Aug 18, 2014
49cd405
SITUATION: ShuffledDStream run tasks whether dstream has partition it…
Jul 3, 2014
3a97745
DStream run tasks only when dstream has partition items
Jul 10, 2014
013ff03
DStream run tasks only when dstream has partition items
Jul 10, 2014
3e9d50a
DStream run tasks only when dstream has partition items
Jul 11, 2014
7a77562
SITUATION: ShuffledDStream run tasks whether dstream has partition it…
Jul 3, 2014
efb8545
DStream run tasks only when dstream has partition items
Jul 10, 2014
3e36c9b
SITUATION: ShuffledDStream run tasks whether dstream has partition it…
Jul 3, 2014
7309d29
DStream run tasks only when dstream has partition items
Jul 10, 2014
13e435c
Merge branch 'master' of https://github.com/guowei2/spark
Aug 18, 2014
f2e9ce6
Merge branch 'sql-memory-patch' of https://github.com/guowei2/spark i…
Aug 18, 2014
5ae43f8
Merge branch 'sql-memory-patch' of https://github.com/guowei2/spark i…
Aug 18, 2014
03dea6b
fix Big-ass commit
Aug 19, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ log4j-defaults.properties
bootstrap-tooltip.js
jquery-1.11.1.min.js
sorttable.js
.*avsc
.*txt
.*json
.*data
Expand Down
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,15 @@ If your project is built with Maven, add this to your POM file's `<dependencies>
</dependency>


## A Note About Thrift JDBC server and CLI for Spark SQL

Spark SQL supports Thrift JDBC server and CLI.
See sql-programming-guide.md for more information about those features.
You can use those features by setting `-Phive-thriftserver` when building Spark as follows.

$ sbt/sbt -Phive-thriftserver assembly


## Configuration

Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
Expand Down
18 changes: 14 additions & 4 deletions bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,18 @@ FWDIR="$(cd `dirname $0`/..; pwd)"
# Export this as SPARK_HOME
export SPARK_HOME="$FWDIR"

source $FWDIR/bin/utils.sh

SCALA_VERSION=2.10

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
function usage() {
echo "Usage: ./bin/pyspark [options]" 1>&2
$FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
exit 0
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
usage
fi

# Exit if the user hasn't compiled Spark
Expand Down Expand Up @@ -66,10 +72,11 @@ fi
# Build up arguments list manually to preserve quotes and backslashes.
# We export Spark submit arguments as an environment variable because shell.py must run as a
# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks.

SUBMIT_USAGE_FUNCTION=usage
gatherSparkSubmitOpts "$@"
PYSPARK_SUBMIT_ARGS=""
whitespace="[[:space:]]"
for i in "$@"; do
for i in "${SUBMISSION_OPTS[@]}"; do
if [[ $i =~ \" ]]; then i=$(echo $i | sed 's/\"/\\\"/g'); fi
if [[ $i =~ $whitespace ]]; then i=\"$i\"; fi
PYSPARK_SUBMIT_ARGS="$PYSPARK_SUBMIT_ARGS $i"
Expand All @@ -90,7 +97,10 @@ fi
if [[ "$1" =~ \.py$ ]]; then
echo -e "\nWARNING: Running python applications through ./bin/pyspark is deprecated as of Spark 1.0." 1>&2
echo -e "Use ./bin/spark-submit <python file>\n" 1>&2
exec $FWDIR/bin/spark-submit "$@"
primary=$1
shift
gatherSparkSubmitOpts "$@"
exec $FWDIR/bin/spark-submit "${SUBMISSION_OPTS[@]}" $primary "${APPLICATION_OPTS[@]}"
else
# Only use ipython if no command line arguments were provided [SPARK-1134]
if [[ "$IPYTHON" = "1" ]]; then
Expand Down
20 changes: 14 additions & 6 deletions bin/spark-shell
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,21 @@ set -o posix
## Global script variables
FWDIR="$(cd `dirname $0`/..; pwd)"

function usage() {
echo "Usage: ./bin/spark-shell [options]"
$FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
exit 0
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
echo "Usage: ./bin/spark-shell [options]"
$FWDIR/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
exit 0
usage
fi

function main(){
source $FWDIR/bin/utils.sh
SUBMIT_USAGE_FUNCTION=usage
gatherSparkSubmitOpts "$@"

function main() {
if $cygwin; then
# Workaround for issue involving JLine and Cygwin
# (see http://sourceforge.net/p/jline/bugs/40/).
Expand All @@ -46,11 +54,11 @@ function main(){
# (see https://github.com/sbt/sbt/issues/562).
stty -icanon min 1 -echo > /dev/null 2>&1
export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main spark-shell "$@"
$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "${SUBMISSION_OPTS[@]}" spark-shell "${APPLICATION_OPTS[@]}"
stty icanon echo > /dev/null 2>&1
else
export SPARK_SUBMIT_OPTS
$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main spark-shell "$@"
$FWDIR/bin/spark-submit --class org.apache.spark.repl.Main "${SUBMISSION_OPTS[@]}" spark-shell "${APPLICATION_OPTS[@]}"
fi
}

Expand Down
2 changes: 1 addition & 1 deletion bin/spark-shell.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ rem

set SPARK_HOME=%~dp0..

cmd /V /E /C %SPARK_HOME%\bin\spark-submit.cmd spark-shell --class org.apache.spark.repl.Main %*
cmd /V /E /C %SPARK_HOME%\bin\spark-submit.cmd --class org.apache.spark.repl.Main %* spark-shell
20 changes: 10 additions & 10 deletions bin/spark-sql
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ CLASS="org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver"
FWDIR="$(cd `dirname $0`/..; pwd)"

function usage {
echo "Usage: ./sbin/spark-sql [options] [cli option]"
echo "Usage: ./bin/spark-sql [options] [cli option]"
pattern="usage"
pattern+="\|Spark assembly has been built with Hive"
pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
Expand Down Expand Up @@ -65,30 +65,30 @@ while (($#)); do
case $1 in
-d | --define | --database | -f | -h | --hiveconf | --hivevar | -i | -p)
ensure_arg_number $# 2
CLI_ARGS+=($1); shift
CLI_ARGS+=($1); shift
CLI_ARGS+=("$1"); shift
CLI_ARGS+=("$1"); shift
;;

-e)
ensure_arg_number $# 2
CLI_ARGS+=($1); shift
CLI_ARGS+=(\"$1\"); shift
CLI_ARGS+=("$1"); shift
CLI_ARGS+=("$1"); shift
;;

-s | --silent)
CLI_ARGS+=($1); shift
CLI_ARGS+=("$1"); shift
;;

-v | --verbose)
# Both SparkSubmit and SparkSQLCLIDriver recognizes -v | --verbose
CLI_ARGS+=($1)
SUBMISSION_ARGS+=($1); shift
CLI_ARGS+=("$1")
SUBMISSION_ARGS+=("$1"); shift
;;

*)
SUBMISSION_ARGS+=($1); shift
SUBMISSION_ARGS+=("$1"); shift
;;
esac
done

eval exec "$FWDIR"/bin/spark-submit --class $CLASS ${SUBMISSION_ARGS[*]} spark-internal ${CLI_ARGS[*]}
exec "$FWDIR"/bin/spark-submit --class $CLASS "${SUBMISSION_ARGS[@]}" spark-internal "${CLI_ARGS[@]}"
59 changes: 59 additions & 0 deletions bin/utils.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Gather all all spark-submit options into SUBMISSION_OPTS
function gatherSparkSubmitOpts() {

if [ -z "$SUBMIT_USAGE_FUNCTION" ]; then
echo "Function for printing usage of $0 is not set." 1>&2
echo "Please set usage function to shell variable 'SUBMIT_USAGE_FUNCTION' in $0" 1>&2
exit 1
fi

# NOTE: If you add or remove spark-sumbmit options,
# modify NOT ONLY this script but also SparkSubmitArgument.scala
SUBMISSION_OPTS=()
APPLICATION_OPTS=()
while (($#)); do
case "$1" in
--master | --deploy-mode | --class | --name | --jars | --py-files | --files | \
--conf | --properties-file | --driver-memory | --driver-java-options | \
--driver-library-path | --driver-class-path | --executor-memory | --driver-cores | \
--total-executor-cores | --executor-cores | --queue | --num-executors | --archives)
if [[ $# -lt 2 ]]; then
"$SUBMIT_USAGE_FUNCTION"
exit 1;
fi
SUBMISSION_OPTS+=("$1"); shift
SUBMISSION_OPTS+=("$1"); shift
;;

--verbose | -v | --supervise)
SUBMISSION_OPTS+=("$1"); shift
;;

*)
APPLICATION_OPTS+=("$1"); shift
;;
esac
done

export SUBMISSION_OPTS
export APPLICATION_OPTS
}
100 changes: 0 additions & 100 deletions core/src/main/java/org/apache/spark/network/netty/FileClient.java

This file was deleted.

Loading