Skip to content

Commit d80f7e9

Browse files
rebase based on #1980
2 parents 4ae834b + a7d65d3 commit d80f7e9

File tree

10,188 files changed

+240594
-98498
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

10,188 files changed

+240594
-98498
lines changed

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.bat text eol=crlf
2+
*.cmd text eol=crlf

.gitignore

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,23 @@
55
*.ipr
66
*.iml
77
*.iws
8+
*.pyc
9+
*.pyo
810
.idea/
911
.idea_modules/
10-
sbt/*.jar
12+
build/*.jar
1113
.settings
1214
.cache
15+
cache
1316
.generated-mima*
14-
/build/
1517
work/
1618
out/
1719
.DS_Store
1820
third_party/libmesos.so
1921
third_party/libmesos.dylib
22+
build/apache-maven*
23+
build/zinc*
24+
build/scala*
2025
conf/java-opts
2126
conf/*.sh
2227
conf/*.cmd
@@ -49,12 +54,17 @@ dependency-reduced-pom.xml
4954
checkpoint
5055
derby.log
5156
dist/
52-
spark-*-bin.tar.gz
57+
dev/create-release/*txt
58+
dev/create-release/*final
59+
spark-*-bin-*.tgz
5360
unit-tests.log
5461
/lib/
62+
ec2/lib/
5563
rat-results.txt
5664
scalastyle.txt
5765
scalastyle-output.xml
66+
R-unit-tests.log
67+
R/unit-tests.out
5868

5969
# For Hive
6070
metastore_db/

.rat-excludes

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
target
2+
cache
23
.gitignore
4+
.gitattributes
35
.project
46
.classpath
57
.mima-excludes
@@ -17,6 +19,7 @@ fairscheduler.xml.template
1719
spark-defaults.conf.template
1820
log4j.properties
1921
log4j.properties.template
22+
metrics.properties
2023
metrics.properties.template
2124
slaves
2225
slaves.template
@@ -43,11 +46,13 @@ SparkImports.scala
4346
SparkJLineCompletion.scala
4447
SparkJLineReader.scala
4548
SparkMemberHandlers.scala
49+
SparkReplReporter.scala
4650
sbt
4751
sbt-launch-lib.bash
4852
plugins.sbt
4953
work
5054
.*\.q
55+
.*\.qv
5156
golden
5257
test.out/*
5358
.*iml
@@ -61,3 +66,6 @@ dist/*
6166
logs
6267
.*scalastyle-output.xml
6368
.*dependency-reduced-pom.xml
69+
known_translations
70+
DESCRIPTION
71+
NAMESPACE

CONTRIBUTING.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
11
## Contributing to Spark
22

3-
Contributions via GitHub pull requests are gladly accepted from their original
4-
author. Along with any pull requests, please state that the contribution is
5-
your original work and that you license the work to the project under the
6-
project's open source license. Whether or not you state this explicitly, by
7-
submitting any copyrighted material via pull request, email, or other means
8-
you agree to license the material under the project's open source license and
9-
warrant that you have the legal authority to do so.
3+
*Before opening a pull request*, review the
4+
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
5+
It lists steps that are required before creating a PR. In particular, consider:
6+
7+
- Is the change important and ready enough to ask the community to spend time reviewing?
8+
- Have you searched for existing, related JIRAs and pull requests?
9+
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
10+
- Is the change being proposed clearly explained and motivated?
1011

11-
Please see the [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark)
12-
for more information.
12+
When you contribute code, you affirm that the contribution is your original work and that you
13+
license the work to the project under the project's open source license. Whether or not you
14+
state this explicitly, by submitting any copyrighted material via pull request, email, or
15+
other means you agree to license the material under the project's open source license and
16+
warrant that you have the legal authority to do so.

LICENSE

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -646,7 +646,8 @@ THE SOFTWARE.
646646

647647
========================================================================
648648
For Scala Interpreter classes (all .scala files in repl/src/main/scala
649-
except for Main.Scala, SparkHelper.scala and ExecutorClassLoader.scala):
649+
except for Main.Scala, SparkHelper.scala and ExecutorClassLoader.scala),
650+
and for SerializableMapWrapper in JavaUtils.scala:
650651
========================================================================
651652

652653
Copyright (c) 2002-2013 EPFL
@@ -712,18 +713,6 @@ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
712713
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
713714
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
714715

715-
========================================================================
716-
For colt:
717-
========================================================================
718-
719-
Copyright (c) 1999 CERN - European Organization for Nuclear Research.
720-
Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. CERN makes no representations about the suitability of this software for any purpose. It is provided "as is" without expressed or implied warranty.
721-
722-
Packages hep.aida.*
723-
724-
Written by Pavel Binko, Dino Ferrero Merlino, Wolfgang Hoschek, Tony Johnson, Andreas Pfeiffer, and others. Check the FreeHEP home page for more info. Permission to use and/or redistribute this work is granted under the terms of the LGPL License, with the exception that any usage related to military applications is expressly forbidden. The software and documentation made available under the terms of this license are provided with no warranty.
725-
726-
727716
========================================================================
728717
For SnapTree:
729718
========================================================================
@@ -766,7 +755,7 @@ SUCH DAMAGE.
766755

767756

768757
========================================================================
769-
For Timsort (core/src/main/java/org/apache/spark/util/collection/Sorter.java):
758+
For Timsort (core/src/main/java/org/apache/spark/util/collection/TimSort.java):
770759
========================================================================
771760
Copyright (C) 2008 The Android Open Source Project
772761

@@ -782,6 +771,41 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
782771
See the License for the specific language governing permissions and
783772
limitations under the License.
784773

774+
========================================================================
775+
For TestTimSort (core/src/test/java/org/apache/spark/util/collection/TestTimSort.java):
776+
========================================================================
777+
Copyright (C) 2015 Stijn de Gouw
778+
779+
Licensed under the Apache License, Version 2.0 (the "License");
780+
you may not use this file except in compliance with the License.
781+
You may obtain a copy of the License at
782+
783+
http://www.apache.org/licenses/LICENSE-2.0
784+
785+
Unless required by applicable law or agreed to in writing, software
786+
distributed under the License is distributed on an "AS IS" BASIS,
787+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
788+
See the License for the specific language governing permissions and
789+
limitations under the License.
790+
791+
========================================================================
792+
For LimitedInputStream
793+
(network/common/src/main/java/org/apache/spark/network/util/LimitedInputStream.java):
794+
========================================================================
795+
Copyright (C) 2007 The Guava Authors
796+
797+
Licensed under the Apache License, Version 2.0 (the "License");
798+
you may not use this file except in compliance with the License.
799+
You may obtain a copy of the License at
800+
801+
http://www.apache.org/licenses/LICENSE-2.0
802+
803+
Unless required by applicable law or agreed to in writing, software
804+
distributed under the License is distributed on an "AS IS" BASIS,
805+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
806+
See the License for the specific language governing permissions and
807+
limitations under the License.
808+
785809

786810
========================================================================
787811
BSD-style licenses

R/.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*.o
2+
*.so
3+
*.Rd
4+
lib
5+
pkg/man
6+
pkg/html

R/DOCUMENTATION.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# SparkR Documentation
2+
3+
SparkR documentation is generated using in-source comments annotated using using
4+
`roxygen2`. After making changes to the documentation, to generate man pages,
5+
you can run the following from an R console in the SparkR home directory
6+
7+
library(devtools)
8+
devtools::document(pkg="./pkg", roclets=c("rd"))
9+
10+
You can verify if your changes are good by running
11+
12+
R CMD check pkg/

R/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# R on Spark
2+
3+
SparkR is an R package that provides a light-weight frontend to use Spark from R.
4+
5+
### SparkR development
6+
7+
#### Build Spark
8+
9+
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run
10+
```
11+
build/mvn -DskipTests -Psparkr package
12+
```
13+
14+
#### Running sparkR
15+
16+
You can start using SparkR by launching the SparkR shell with
17+
18+
./bin/sparkR
19+
20+
The `sparkR` script automatically creates a SparkContext with Spark by default in
21+
local mode. To specify the Spark master of a cluster for the automatically created
22+
SparkContext, you can run
23+
24+
./bin/sparkR --master "local[2]"
25+
26+
To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`
27+
28+
#### Using SparkR from RStudio
29+
30+
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
31+
```
32+
# Set this to where Spark is installed
33+
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
34+
# This line loads SparkR from the installed directory
35+
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
36+
library(SparkR)
37+
sc <- sparkR.init(master="local")
38+
```
39+
40+
#### Making changes to SparkR
41+
42+
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
43+
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
44+
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below.
45+
46+
#### Generating documentation
47+
48+
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
49+
50+
### Examples, Unit tests
51+
52+
SparkR comes with several sample programs in the `examples/src/main/r` directory.
53+
To run one of them, use `./bin/sparkR <filename> <args>`. For example:
54+
55+
./bin/sparkR examples/src/main/r/pi.R local[2]
56+
57+
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):
58+
59+
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
60+
./R/run-tests.sh
61+
62+
### Running on YARN
63+
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
64+
```
65+
export YARN_CONF_DIR=/etc/hadoop/conf
66+
./bin/spark-submit --master yarn examples/src/main/r/pi.R 4
67+
```

R/WINDOWS.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Building SparkR on Windows
2+
3+
To build SparkR on Windows, the following steps are required
4+
5+
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
6+
include Rtools and R in `PATH`.
7+
2. Install
8+
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
9+
`JAVA_HOME` in the system environment variables.
10+
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
11+
directory in Maven in `PATH`.
12+
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
13+
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

R/create-docs.sh

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/bin/bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# Script to create API docs for SparkR
21+
# This requires `devtools` and `knitr` to be installed on the machine.
22+
23+
# After running this script the html docs can be found in
24+
# $SPARK_HOME/R/pkg/html
25+
26+
# Figure out where the script is
27+
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
28+
pushd $FWDIR
29+
30+
# Generate Rd file
31+
Rscript -e 'library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))'
32+
33+
# Install the package
34+
./install-dev.sh
35+
36+
# Now create HTML files
37+
38+
# knit_rd puts html in current working directory
39+
mkdir -p pkg/html
40+
pushd pkg/html
41+
42+
Rscript -e 'library(SparkR, lib.loc="../../lib"); library(knitr); knit_rd("SparkR")'
43+
44+
popd
45+
46+
popd

R/install-dev.bat

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
@echo off
2+
3+
rem
4+
rem Licensed to the Apache Software Foundation (ASF) under one or more
5+
rem contributor license agreements. See the NOTICE file distributed with
6+
rem this work for additional information regarding copyright ownership.
7+
rem The ASF licenses this file to You under the Apache License, Version 2.0
8+
rem (the "License"); you may not use this file except in compliance with
9+
rem the License. You may obtain a copy of the License at
10+
rem
11+
rem http://www.apache.org/licenses/LICENSE-2.0
12+
rem
13+
rem Unless required by applicable law or agreed to in writing, software
14+
rem distributed under the License is distributed on an "AS IS" BASIS,
15+
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
rem See the License for the specific language governing permissions and
17+
rem limitations under the License.
18+
rem
19+
20+
rem Install development version of SparkR
21+
rem
22+
23+
set SPARK_HOME=%~dp0..
24+
25+
MKDIR %SPARK_HOME%\R\lib
26+
27+
R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\

0 commit comments

Comments
 (0)