Skip to content

Kinesis wip #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 177 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
177 commits
Select commit Hold shift + click to select a range
438859e
[SPARK-6122] [CORE] Upgrade tachyon-client version to 0.6.3
calvinjia Apr 24, 2015
d874f8b
[PySpark][Minor] Update sql example, so that can read file correctly
Sephiroth-Lin Apr 25, 2015
59b7cfc
[SPARK-7136][Docs] Spark SQL and DataFrame Guide fix example file and…
dbsiegel Apr 25, 2015
cca9905
update the deprecated CountMinSketchMonoid function to TopPctCMS func…
caikehe Apr 25, 2015
a61d65f
Revert "[SPARK-6752][Streaming] Allow StreamingContext to be recreate…
pwendell Apr 25, 2015
a7160c4
[SPARK-6113] [ML] Tree ensembles for Pipelines API
jkbradley Apr 25, 2015
aa6966f
[SQL] Update SQL readme to include instructions on generating golden …
yhuai Apr 25, 2015
a11c868
[SPARK-7092] Update spark scala version to 2.11.6
ScrapCodes Apr 25, 2015
f5473c2
[SPARK-6014] [CORE] [HOTFIX] Add try-catch block around ShutDownHook
nishkamravi2 Apr 26, 2015
9a5bbe0
[MINOR] [MLLIB] Refactor toString method in MLLIB
Apr 26, 2015
ca55dc9
[SPARK-7152][SQL] Add a Column expression for partition ID.
rxin Apr 26, 2015
d188b8b
[SQL][Minor] rename DataTypeParser.apply to DataTypeParser.parse
scwf Apr 27, 2015
82bb7fd
[SPARK-6505] [SQL] Remove the reflection call in HiveFunctionWrapper
baishuo Apr 27, 2015
998aac2
[SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact
chernetsov Apr 27, 2015
7078f60
[SPARK-6856] [R] Make RDD information more useful in SparkR
Jeffrharr Apr 27, 2015
ef82bdd
SPARK-7107 Add parameter for zookeeper.znode.parent to hbase_inputfor…
tedyu Apr 27, 2015
ca9f4eb
[SPARK-6991] [SPARKR] Adds support for zipPartitions.
hlin09 Apr 27, 2015
b9de9e0
[SPARK-7103] Fix crash with SparkContext.union when RDD has no partit…
stshe Apr 27, 2015
8e1c00d
[SPARK-6738] [CORE] Improve estimate the size of a large array
shenh062326 Apr 27, 2015
5d45e1f
[SPARK-3090] [CORE] Stop SparkContext if user forgets to.
Apr 27, 2015
ab5adb7
[SPARK-7145] [CORE] commons-lang (2.x) classes used instead of common…
srowen Apr 27, 2015
62888a4
[SPARK-7162] [YARN] Launcher error in yarn-client
witgo Apr 27, 2015
4d9e560
[SPARK-7090] [MLLIB] Introduce LDAOptimizer to LDA to further improve…
hhbyyh Apr 28, 2015
874a2ca
[SPARK-7174][Core] Move calling `TaskScheduler.executorHeartbeatRecei…
zsxwing Apr 28, 2015
29576e7
[SPARK-6829] Added math functions for DataFrames
brkyvz Apr 28, 2015
9e4e82b
[SPARK-5946] [STREAMING] Add Python API for direct Kafka stream
jerryshao Apr 28, 2015
bf35edd
[SPARK-7187] SerializationDebugger should not crash user code
Apr 28, 2015
d94cd1a
[SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs.
rxin Apr 28, 2015
e13cd86
[SPARK-6352] [SQL] Custom parquet output committer
Apr 28, 2015
7f3b3b7
[SPARK-7168] [BUILD] Update plugin versions in Maven build and centra…
srowen Apr 28, 2015
75905c5
[SPARK-7100] [MLLIB] Fix persisted RDD leak in GradientBoostTrees
Apr 28, 2015
268c419
[SPARK-6435] spark-shell --jars option does not add all jars to class…
tsudukim Apr 28, 2015
6a827d5
[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN
Apr 28, 2015
b14cd23
[SPARK-7140] [MLLIB] only scan the first 16 entries in Vector.hashCode
mengxr Apr 28, 2015
52ccf1d
[Core][test][minor] replace try finally block with tryWithSafeFinally
liyezhang556520 Apr 28, 2015
8aab94d
[SPARK-4286] Add an external shuffle service that can be run as a dae…
dragos Apr 28, 2015
2d222fb
[SPARK-5932] [CORE] Use consistent naming for size properties
Apr 28, 2015
8009810
[SPARK-6314] [CORE] handle JsonParseException for history server
liyezhang556520 Apr 28, 2015
53befac
[SPARK-5338] [MESOS] Add cluster mode support for Mesos
tnachen Apr 28, 2015
28b1af7
[MINOR] [CORE] Warn users who try to cache RDDs with dynamic allocati…
Apr 28, 2015
f0a1f90
[SPARK-7201] [MLLIB] move Identifiable to ml.util
mengxr Apr 28, 2015
555213e
Closes #4807
mengxr Apr 28, 2015
d36e673
[SPARK-6965] [MLLIB] StringIndexer handles numeric input.
mengxr Apr 29, 2015
5c8f4bd
[SPARK-7138] [STREAMING] Add method to BlockGenerator to add multiple…
tdas Apr 29, 2015
a8aeadb
[SPARK-7208] [ML] [PYTHON] Added Matrix, SparseMatrix to __all__ list…
jkbradley Apr 29, 2015
5ef006f
[SPARK-6756] [MLLIB] add toSparse, toDense, numActives, numNonzeros, …
mengxr Apr 29, 2015
271c4c6
[SPARK-7215] made coalesce and repartition a part of the query plan
brkyvz Apr 29, 2015
f98773a
[SPARK-7205] Support `.ivy2/local` and `.m2/repositories/` in --packages
brkyvz Apr 29, 2015
8dee274
MAINTENANCE: Automated closing of pull requests.
pwendell Apr 29, 2015
fe917f5
[SPARK-7188] added python support for math DataFrame functions
brkyvz Apr 29, 2015
1fd6ed9
[SPARK-7204] [SQL] Fix callSite for Dataframe and SQL operations
pwendell Apr 29, 2015
f49284b
[SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use managed memory for aggr…
JoshRosen Apr 29, 2015
baed3f2
[SPARK-6918] [YARN] Secure HBase support.
deanchen Apr 29, 2015
687273d
[SPARK-7223] Rename RPC askWithReply -> askWithReply, sendWithReply -…
rxin Apr 29, 2015
3df9c5d
Better error message on access to non-existing attribute
ksonj Apr 29, 2015
81ea42b
[SQL][Minor] fix java doc for DataFrame.agg
cloud-fan Apr 29, 2015
c0c0ba6
Fix a typo of "threshold"
yinxusen Apr 29, 2015
1868bd4
[SPARK-7056] [STREAMING] Make the Write Ahead Log pluggable
tdas Apr 29, 2015
a9c4e29
[SPARK-6752] [STREAMING] [REOPENED] Allow StreamingContext to be recr…
tdas Apr 29, 2015
3a180c1
[SPARK-6629] cancelJobGroup() may not work for jobs whose job groups …
JoshRosen Apr 29, 2015
15995c8
[SPARK-7222] [ML] Added mathematical derivation in comment and compre…
Apr 29, 2015
c9d530e
[SPARK-6529] [ML] Add Word2Vec transformer
yinxusen Apr 29, 2015
d7dbce8
[SPARK-7156][SQL] support RandomSplit in DataFrames
brkyvz Apr 29, 2015
7f4b583
[SPARK-7181] [CORE] fix inifite loop in Externalsorter's mergeWithAgg…
chouqin Apr 29, 2015
3fc6cfd
[SPARK-7155] [CORE] Allow newAPIHadoopFile to support comma-separated…
yongtang Apr 29, 2015
f8cbb0a
[SPARK-7229] [SQL] SpecificMutableRow should take integer type as int…
chenghao-intel Apr 29, 2015
b1ef6a6
[SPARK-7259] [ML] VectorIndexer: do not copy non-ML metadata to outpu…
jkbradley Apr 29, 2015
1fdfdb4
[SQL] [Minor] Print detail query execution info when spark answer is …
scwf Apr 30, 2015
114bad6
[SPARK-7176] [ML] Add validation functionality to Param
jkbradley Apr 30, 2015
1b7106b
[SPARK-6862] [STREAMING] [WEBUI] Add BatchPage to display details of …
zsxwing Apr 30, 2015
7143f6e
[SPARK-7234][SQL] Fix DateType mismatch when codegen on.
Apr 30, 2015
5553198
[SPARK-7156][SQL] Addressed follow up comments for randomSplit
brkyvz Apr 30, 2015
ba49eb1
Some code clean up.
Apr 30, 2015
4459514
[SPARK-7225][SQL] CombineLimits optimizer does not work
pzzs Apr 30, 2015
254e050
[SPARK-1406] Mllib pmml model export
selvinsource Apr 30, 2015
47bf406
[HOTFIX] Disabling flaky test (fix in progress as part of SPARK-7224)
pwendell Apr 30, 2015
7dacc08
[SPARK-7224] added mock repository generator for --packages tests
brkyvz Apr 30, 2015
6c65da6
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YA…
harishreedharan Apr 30, 2015
adbdb19
[SPARK-7207] [ML] [BUILD] Added ml.recommendation, ml.regression to S…
jkbradley Apr 30, 2015
e0628f2
Revert "[SPARK-5342] [YARN] Allow long running Spark apps to run on s…
pwendell Apr 30, 2015
6702324
[SPARK-7196][SQL] Support precision and scale of decimal type for JDBC
viirya Apr 30, 2015
07a8620
[SPARK-7288] Suppress compiler warnings due to use of sun.misc.Unsafe…
JoshRosen Apr 30, 2015
77cc25f
[SPARK-7267][SQL]Push down Project when it's child is Limit
pzzs Apr 30, 2015
fa01bec
[Build] Enable MiMa checks for SQL
JoshRosen Apr 30, 2015
1c3e402
[SPARK-7279] Removed diffSum which is theoretical zero in LinearRegre…
Apr 30, 2015
149b3ee
[SPARK-7242][SQL][MLLIB] Frequent items for DataFrames
brkyvz Apr 30, 2015
ee04413
[SPARK-7280][SQL] Add "drop" column/s on a data frame
rakeshchalasani May 1, 2015
0797338
[SPARK-7093] [SQL] Using newPredicate in NestedLoopJoin to enable cod…
scwf May 1, 2015
a0d8a61
[SPARK-7109] [SQL] Push down left side filter for left semi join
scwf May 1, 2015
e991255
[SPARK-6913][SQL] Fixed "java.sql.SQLException: No suitable driver fo…
SlavikBaranov May 1, 2015
3ba5aaa
[SPARK-5213] [SQL] Pluggable SQL Parser Support
chenghao-intel May 1, 2015
473552f
[SPARK-7123] [SQL] support table.star in sqlcontext
scwf May 1, 2015
beeafcf
Revert "[SPARK-5213] [SQL] Pluggable SQL Parser Support"
pwendell May 1, 2015
69a739c
[SPARK-7282] [STREAMING] Fix the race conditions in StreamingListener…
zsxwing May 1, 2015
b5347a4
[SPARK-7248] implemented random number generators for DataFrames
brkyvz May 1, 2015
36a7a68
[SPARK-6479] [BLOCK MANAGER] Create off-heap block storage API
zhzhan May 1, 2015
a9fc505
HOTFIX: Disable buggy dependency checker
pwendell May 1, 2015
0a2b15c
[SPARK-4550] In sort-based shuffle, store map outputs in serialized form
sryza May 1, 2015
7cf1eb7
[SPARK-7287] enabled fixed test
brkyvz May 1, 2015
14b3288
[SPARK-7291] [CORE] Fix a flaky test in AkkaRpcEnvSuite
zsxwing May 1, 2015
c24aeb6
[SPARK-6257] [PYSPARK] [MLLIB] MLlib API missing items in Recommendation
MechCoder May 1, 2015
7fe0f3f
[SPARK-3468] [WEBUI] Timeline-View feature
sarutak May 1, 2015
3052f49
[SPARK-4705] Handle multiple app attempts event logs, history server.
May 1, 2015
3b514af
[SPARK-3066] [MLLIB] Support recommendAll in matrix factorization model
May 1, 2015
7630213
[SPARK-5891] [ML] Add Binarizer ML Transformer
viirya May 1, 2015
c8c481d
Limit help option regex
May 1, 2015
27de6fe
changing persistence engine trait to an abstract class
nirandaperera May 1, 2015
7d42722
[SPARK-5854] personalized page rank
dwmclary May 1, 2015
1262e31
[SPARK-6846] [WEBUI] [HOTFIX] return to GET for kill link in UI since…
srowen May 1, 2015
1686032
[SPARK-7183] [NETWORK] Fix memory leak of TransportRequestHandler.str…
viirya May 1, 2015
3753776
[SPARK-7274] [SQL] Create Column expression for array/struct creation.
rxin May 1, 2015
58d6584
Revert "[SPARK-7287] enabled fixed test"
pwendell May 1, 2015
c6d9a42
Revert "[SPARK-7224] added mock repository generator for --packages t…
pwendell May 1, 2015
f53a488
[SPARK-7213] [YARN] Check for read permissions before copying a Hadoo…
nishkamravi2 May 1, 2015
7b5dd3e
[SPARK-7281] [YARN] Add option to set AM's lib path in client mode.
May 1, 2015
4dc8d74
[SPARK-7240][SQL] Single pass covariance calculation for dataframes
brkyvz May 1, 2015
b1f4ca8
[SPARK-5342] [YARN] Allow long running Spark apps to run on secure YA…
harishreedharan May 1, 2015
5c1faba
Ignore flakey test in SparkSubmitUtilsSuite
pwendell May 1, 2015
41c6a44
[SPARK-7312][SQL] SPARK-6913 broke jdk6 build
yhuai May 1, 2015
e6fb377
[SPARK-7304] [BUILD] Include $@ in call to mvn consistently in make-d…
May 2, 2015
98e7045
[SPARK-6999] [SQL] Remove the infinite recursive method (useless)
chenghao-intel May 2, 2015
ebc25a4
[SPARK-7309] [CORE] [STREAMING] Shutdown the thread pools in Received…
zsxwing May 2, 2015
b88c275
[SPARK-7112][Streaming][WIP] Add a InputInfoTracker to track all the …
jerryshao May 2, 2015
4786484
[SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
koeninger May 2, 2015
ae98eec
[SPARK-3444] Provide an easy way to change log level
holdenk May 2, 2015
099327d
[SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a…
sryza May 2, 2015
2022193
[SPARK-7216] [MESOS] Add driver details page to Mesos cluster UI.
tnachen May 2, 2015
b4b43df
[SPARK-6443] [SPARK SUBMIT] Could not submit app in standalone cluste…
WangTaoTheTonic May 2, 2015
8f50a07
[SPARK-2691] [MESOS] Support for Mesos DockerInfo
hellertime May 2, 2015
38d4e9e
[SPARK-6229] Add SASL encryption to network library.
May 2, 2015
b79aeb9
[SPARK-7317] [Shuffle] Expose shuffle handle
May 2, 2015
2e0f357
[SPARK-7242] added python api for freqItems in DataFrames
brkyvz May 2, 2015
7394e7a
[SPARK-7120] [SPARK-7121] Closure cleaner nesting + documentation + t…
May 2, 2015
ecc6eb5
[SPARK-7315] [STREAMING] [TEST] Fix flaky WALBackedBlockRDDSuite
tdas May 2, 2015
856a571
[SPARK-3444] Fix typo in Dataframes.py introduced in []
deanchen May 2, 2015
da30352
[SPARK-7323] [SPARK CORE] Use insertAll instead of insert while mergi…
May 2, 2015
bfcd528
[SPARK-6030] [CORE] Using simulated field layout method to compute cl…
advancedxy May 2, 2015
82c8c37
[MINOR] [HIVE] Fix QueryPartitionSuite.
May 2, 2015
5d6b90d
[SPARK-5213] [SQL] Pluggable SQL Parser Support
chenghao-intel May 2, 2015
ea841ef
[SPARK-7255] [STREAMING] [DOCUMENTATION] Added documentation for spar…
BenFradet May 2, 2015
49549d5
[SPARK-7031] [THRIFTSERVER] let thrift server take SPARK_DAEMON_MEMOR…
WangTaoTheTonic May 2, 2015
f4af925
[SPARK-7022] [PYSPARK] [ML] Add ML.Tuning.ParamGridBuilder to PySpark
May 3, 2015
daa70bf
[SPARK-6907] [SQL] Isolated client for HiveMetastore
marmbrus May 3, 2015
9e25b09
[SPARK-7302] [DOCS] SPARK building documentation still mentions build…
srowen May 3, 2015
1ffa8cb
[SPARK-7329] [MLLIB] simplify ParamGridBuilder impl
mengxr May 4, 2015
9646018
[SPARK-7241] Pearson correlation for DataFrames
brkyvz May 4, 2015
3539cb7
[SPARK-5563] [MLLIB] LDA with online variational inference
hhbyyh May 4, 2015
343d3bf
[SPARK-5100] [SQL] add webui for thriftserver
tianyi May 4, 2015
5a1a107
[MINOR] Fix python test typo?
May 4, 2015
e0833c5
[SPARK-5956] [MLLIB] Pipeline components should be copyable.
mengxr May 4, 2015
f32e69e
[SPARK-7319][SQL] Improve the output from DataFrame.show()
May 4, 2015
fc8b581
[SPARK-6943] [SPARK-6944] DAG visualization on SparkUI
May 4, 2015
8055411
[SPARK-7243][SQL] Contingency Tables for DataFrames
brkyvz May 5, 2015
678c4da
[SPARK-7266] Add ExpectsInputTypes to expressions when possible.
rxin May 5, 2015
8aa5aea
[SPARK-7236] [CORE] Fix to prevent AkkaUtils askWithReply from sleepi…
BryanCutler May 5, 2015
e9b16e6
[SPARK-7314] [SPARK-3524] [PYSPARK] upgrade Pyrolite to 4.4
mengxr May 5, 2015
da738cf
[MINOR] Renamed variables in SparkKMeans.scala, LocalKMeans.scala and…
pippobaudos May 5, 2015
c5790a2
[MINOR] [BUILD] Declare ivy dependency in root pom.
May 5, 2015
1854ac3
[SPARK-7139] [STREAMING] Allow received block metadata to be saved to…
tdas May 5, 2015
8776fe0
[HOTFIX] [TEST] Ignoring flaky tests
tdas May 5, 2015
8436f7e
[SPARK-7113] [STREAMING] Support input information reporting for Dire…
jerryshao May 5, 2015
4d29867
[SPARK-7341] [STREAMING] [TESTS] Fix the flaky test: org.apache.spark…
zsxwing May 5, 2015
fc8feaa
[SPARK-6653] [YARN] New config to specify port for sparkYarnAM actor …
May 5, 2015
4222da6
[SPARK-5112] Expose SizeEstimator as a developer api
sryza May 5, 2015
51f4620
[SPARK-7357] Improving HBaseTest example
JihongMA May 5, 2015
d497358
[SPARK-3454] separate json endpoints for data in the UI
squito May 5, 2015
b83091a
[MINOR] Minor update for document
viirya May 5, 2015
5ffc73e
[SPARK-5074] [CORE] [TESTS] Fix the flakey test 'run shuffle with map…
zsxwing May 5, 2015
c6d1efb
[SPARK-7350] [STREAMING] [WEBUI] Attach the Streaming tab when callin…
zsxwing May 5, 2015
5ab652c
[SPARK-7202] [MLLIB] [PYSPARK] Add SparseMatrixPickler to SerDe
MechCoder May 5, 2015
5995ada
[SPARK-6612] [MLLIB] [PYSPARK] Python KMeans parity
FlytxtRnD May 5, 2015
9d250e6
Closes #5591
mengxr May 5, 2015
d4cb38a
[MLLIB] [TREE] Verify size of input rdd > 0 when building meta data
May 5, 2015
1fdabf8
[SPARK-7237] Many user provided closures are not actually cleaned
May 5, 2015
57e9f29
[SPARK-7318] [STREAMING] DStream cleans objects that are not closures
May 5, 2015
9f1f9b1
[SPARK-7007] [CORE] Add a metric source for ExecutorAllocationManager
jerryshao May 5, 2015
440dbf2
All new Kinesis integration - WIP
tdas May 5, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 14 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ TAGS
RELEASE
control
docs
docker.properties.template
fairscheduler.xml.template
spark-defaults.conf.template
log4j.properties
Expand All @@ -29,7 +30,13 @@ spark-env.sh.template
log4j-defaults.properties
bootstrap-tooltip.js
jquery-1.11.1.min.js
d3.min.js
dagre-d3.min.js
graphlib-dot.min.js
sorttable.js
vis.min.js
vis.min.css
vis.map
.*avsc
.*txt
.*json
Expand Down Expand Up @@ -67,5 +74,12 @@ logs
.*scalastyle-output.xml
.*dependency-reduced-pom.xml
known_translations
json_expectation
local-1422981759269/*
local-1422981780767/*
local-1425081759269/*
local-1426533911241/*
local-1426633911242/*
local-1427397477963/*
DESCRIPTION
NAMESPACE
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -814,6 +814,7 @@ BSD-style licenses
The following components are provided under a BSD-style license. See project link for details.

(BSD 3 Clause) core (com.github.fommil.netlib:core:1.1.2 - https://github.com/fommil/netlib-java/core)
(BSD 3 Clause) JPMML-Model (org.jpmml:pmml-model:1.1.15 - https://github.com/jpmml/jpmml-model)
(BSD 3-clause style license) jblas (org.jblas:jblas:1.2.3 - http://jblas.org/)
(BSD License) AntLR Parser Generator (antlr:antlr:2.7.7 - http://www.antlr.org/)
(BSD License) Javolution (javolution:javolution:5.5.1 - http://javolution.org)
Expand Down
1 change: 1 addition & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ exportMethods(
"unpersist",
"value",
"values",
"zipPartitions",
"zipRDD",
"zipWithIndex",
"zipWithUniqueId"
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ setMethod("isLocal",
setMethod("showDF",
signature(x = "DataFrame"),
function(x, numRows = 20) {
cat(callJMethod(x@sdf, "showString", numToInt(numRows)), "\n")
callJMethod(x@sdf, "showString", numToInt(numRows))
})

#' show
Expand Down
51 changes: 51 additions & 0 deletions R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,11 @@ setMethod("initialize", "RDD", function(.Object, jrdd, serializedMode,
.Object
})

setMethod("show", "RDD",
function(.Object) {
cat(paste(callJMethod(.Object@jrdd, "toString"), "\n", sep=""))
})

setMethod("initialize", "PipelinedRDD", function(.Object, prev, func, jrdd_val) {
.Object@env <- new.env()
.Object@env$isCached <- FALSE
Expand Down Expand Up @@ -1590,3 +1595,49 @@ setMethod("intersection",

keys(filterRDD(cogroup(rdd1, rdd2, numPartitions = numPartitions), filterFunction))
})

#' Zips an RDD's partitions with one (or more) RDD(s).
#' Same as zipPartitions in Spark.
#'
#' @param ... RDDs to be zipped.
#' @param func A function to transform zipped partitions.
#' @return A new RDD by applying a function to the zipped partitions.
#' Assumes that all the RDDs have the *same number of partitions*, but
#' does *not* require them to have the same number of elements in each partition.
#' @examples
#'\dontrun{
#' sc <- sparkR.init()
#' rdd1 <- parallelize(sc, 1:2, 2L) # 1, 2
#' rdd2 <- parallelize(sc, 1:4, 2L) # 1:2, 3:4
#' rdd3 <- parallelize(sc, 1:6, 2L) # 1:3, 4:6
#' collect(zipPartitions(rdd1, rdd2, rdd3,
#' func = function(x, y, z) { list(list(x, y, z))} ))
#' # list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6)))
#'}
#' @rdname zipRDD
#' @aliases zipPartitions,RDD
setMethod("zipPartitions",
"RDD",
function(..., func) {
rrdds <- list(...)
if (length(rrdds) == 1) {
return(rrdds[[1]])
}
nPart <- sapply(rrdds, numPartitions)
if (length(unique(nPart)) != 1) {
stop("Can only zipPartitions RDDs which have the same number of partitions.")
}

rrdds <- lapply(rrdds, function(rdd) {
mapPartitionsWithIndex(rdd, function(partIndex, part) {
print(length(part))
list(list(partIndex, part))
})
})
union.rdd <- Reduce(unionRDD, rrdds)
zipped.rdd <- values(groupByKey(union.rdd, numPartitions = nPart[1]))
res <- mapPartitions(zipped.rdd, function(plist) {
do.call(func, plist[[1]])
})
res
})
5 changes: 5 additions & 0 deletions R/pkg/R/generics.R
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,11 @@ setGeneric("unpersist", function(x, ...) { standardGeneric("unpersist") })
#' @export
setGeneric("zipRDD", function(x, other) { standardGeneric("zipRDD") })

#' @rdname zipRDD
#' @export
setGeneric("zipPartitions", function(..., func) { standardGeneric("zipPartitions") },
signature = "...")

#' @rdname zipWithIndex
#' @seealso zipWithUniqueId
#' @export
Expand Down
33 changes: 33 additions & 0 deletions R/pkg/inst/tests/test_binary_function.R
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,36 @@ test_that("cogroup on two RDDs", {
expect_equal(sortKeyValueList(actual),
sortKeyValueList(expected))
})

test_that("zipPartitions() on RDDs", {
rdd1 <- parallelize(sc, 1:2, 2L) # 1, 2
rdd2 <- parallelize(sc, 1:4, 2L) # 1:2, 3:4
rdd3 <- parallelize(sc, 1:6, 2L) # 1:3, 4:6
actual <- collect(zipPartitions(rdd1, rdd2, rdd3,
func = function(x, y, z) { list(list(x, y, z))} ))
expect_equal(actual,
list(list(1, c(1,2), c(1,2,3)), list(2, c(3,4), c(4,5,6))))

mockFile = c("Spark is pretty.", "Spark is awesome.")
fileName <- tempfile(pattern="spark-test", fileext=".tmp")
writeLines(mockFile, fileName)

rdd <- textFile(sc, fileName, 1)
actual <- collect(zipPartitions(rdd, rdd,
func = function(x, y) { list(paste(x, y, sep = "\n")) }))
expected <- list(paste(mockFile, mockFile, sep = "\n"))
expect_equal(actual, expected)

rdd1 <- parallelize(sc, 0:1, 1)
actual <- collect(zipPartitions(rdd1, rdd,
func = function(x, y) { list(x + nchar(y)) }))
expected <- list(0:1 + nchar(mockFile))
expect_equal(actual, expected)

rdd <- map(rdd, function(x) { x })
actual <- collect(zipPartitions(rdd, rdd1,
func = function(x, y) { list(y + nchar(x)) }))
expect_equal(actual, expected)

unlink(fileName)
})
5 changes: 5 additions & 0 deletions R/pkg/inst/tests/test_rdd.R
Original file line number Diff line number Diff line change
Expand Up @@ -759,6 +759,11 @@ test_that("collectAsMap() on a pairwise RDD", {
expect_equal(vals, list(`1` = "a", `2` = "b"))
})

test_that("show()", {
rdd <- parallelize(sc, list(1:10))
expect_output(show(rdd), "ParallelCollectionRDD\\[\\d+\\] at parallelize at RRDD\\.scala:\\d+")
})

test_that("sampleByKey() on pairwise RDDs", {
rdd <- parallelize(sc, 1:2000)
pairsRDD <- lapply(rdd, function(x) { if (x %% 2 == 0) list("a", x) else list("b", x) })
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/inst/tests/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -641,7 +641,7 @@ test_that("toJSON() returns an RDD of the correct values", {

test_that("showDF()", {
df <- jsonFile(sqlCtx, jsonPath)
expect_output(showDF(df), "age name \nnull Michael\n30 Andy \n19 Justin ")
expect_output(showDF(df), "+----+-------+\n| age| name|\n+----+-------+\n|null|Michael|\n| 30| Andy|\n| 19| Justin|\n+----+-------+\n")
})

test_that("isLocal()", {
Expand Down
11 changes: 0 additions & 11 deletions assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,6 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>dist</id>
Expand All @@ -213,16 +212,6 @@
</plugins>
</build>
</profile>
<profile>
<id>kinesis-asl</id>
<dependencies>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>${commons.httpclient.version}</version>
</dependency>
</dependencies>
</profile>

<!-- Profiles that disable inclusion of certain dependencies. -->
<profile>
Expand Down
5 changes: 4 additions & 1 deletion bin/spark-class2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,10 @@ if not "x%JAVA_HOME%"=="x" set RUNNER=%JAVA_HOME%\bin\java

rem The launcher library prints the command to be executed in a single line suitable for being
rem executed by the batch interpreter. So read all the output of the launcher into a variable.
for /f "tokens=*" %%i in ('cmd /C ""%RUNNER%" -cp %LAUNCH_CLASSPATH% org.apache.spark.launcher.Main %*"') do (
set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
"%RUNNER%" -cp %LAUNCH_CLASSPATH% org.apache.spark.launcher.Main %* > %LAUNCHER_OUTPUT%
for /f "tokens=*" %%i in (%LAUNCHER_OUTPUT%) do (
set SPARK_CMD=%%i
)
del %LAUNCHER_OUTPUT%
%SPARK_CMD%
2 changes: 1 addition & 1 deletion bin/spark-shell2.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ rem

set SPARK_HOME=%~dp0..

echo "%*" | findstr " --help -h" >nul
echo "%*" | findstr " \<--help\> \<-h\>" >nul
if %ERRORLEVEL% equ 0 (
call :usage
exit /b 0
Expand Down
3 changes: 3 additions & 0 deletions conf/docker.properties.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
spark.mesos.executor.docker.image: <image built from `../docker/spark-mesos/Dockerfile`>
spark.mesos.executor.docker.volumes: /usr/local/lib:/host/usr/local/lib:ro
spark.mesos.executor.home: /opt/spark
3 changes: 2 additions & 1 deletion conf/spark-env.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.

# Options read when launching programs locally with
# Options read when launching programs locally with
# ./bin/run-example or ./bin/spark-submit
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
# - SPARK_LOCAL_IP, to set the IP address Spark binds to on this node
Expand Down Expand Up @@ -39,6 +39,7 @@
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers

Expand Down
23 changes: 19 additions & 4 deletions core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,10 @@
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
Expand All @@ -91,6 +95,11 @@
<artifactId>spark-network-shuffle_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-unsafe_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>net.java.dev.jets3t</groupId>
<artifactId>jets3t</artifactId>
Expand Down Expand Up @@ -219,6 +228,14 @@
<artifactId>json4s-jackson_${scala.binary.version}</artifactId>
<version>3.2.10</version>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-server</artifactId>
</dependency>
<dependency>
<groupId>com.sun.jersey</groupId>
<artifactId>jersey-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.mesos</groupId>
<artifactId>mesos</artifactId>
Expand Down Expand Up @@ -264,7 +281,6 @@
<dependency>
<groupId>org.apache.ivy</groupId>
<artifactId>ivy</artifactId>
<version>${ivy.version}</version>
</dependency>
<dependency>
<groupId>oro</groupId>
Expand All @@ -275,7 +291,7 @@
<dependency>
<groupId>org.tachyonproject</groupId>
<artifactId>tachyon-client</artifactId>
<version>0.5.0</version>
<version>0.6.4</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
Expand Down Expand Up @@ -353,7 +369,7 @@
<dependency>
<groupId>org.spark-project</groupId>
<artifactId>pyrolite</artifactId>
<version>2.0.1</version>
<version>4.4</version>
</dependency>
<dependency>
<groupId>net.sf.py4j</groupId>
Expand Down Expand Up @@ -474,7 +490,6 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.3.2</version>
<executions>
<execution>
<id>sparkr-pkg</id>
Expand Down
8 changes: 7 additions & 1 deletion core/src/main/java/org/apache/spark/JobExecutionStatus.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,15 @@

package org.apache.spark;

import org.apache.spark.util.EnumUtil;

public enum JobExecutionStatus {
RUNNING,
SUCCEEDED,
FAILED,
UNKNOWN
UNKNOWN;

public static JobExecutionStatus fromString(String str) {
return EnumUtil.parseIgnoreCase(JobExecutionStatus.class, str);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,16 @@
* limitations under the License.
*/

package org.apache.spark.sql.catalyst.expressions
package org.apache.spark.status.api.v1;

import java.util.Random
import org.apache.spark.util.EnumUtil;

import org.apache.spark.sql.types.{DataType, DoubleType}
public enum ApplicationStatus {
COMPLETED,
RUNNING;


case object Rand extends LeafExpression {
override def dataType: DataType = DoubleType
override def nullable: Boolean = false

private[this] lazy val rand = new Random

override def eval(input: Row = null): EvaluatedType = {
rand.nextDouble().asInstanceOf[EvaluatedType]
public static ApplicationStatus fromString(String str) {
return EnumUtil.parseIgnoreCase(ApplicationStatus.class, str);
}

override def toString: String = "RAND()"
}
Loading