Skip to content

Commit ca4309f

Browse files
foxishash211
authored andcommitted
nicer readme (apache#333)
1 parent 5470366 commit ca4309f

File tree

2 files changed

+2
-113
lines changed

2 files changed

+2
-113
lines changed

README.md

Lines changed: 2 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This is a collaboratively maintained project working on [SPARK-18278](https://is
88

99
## Getting Started
1010

11-
- [Usage guide](docs/running-on-kubernetes.md) shows how to run the code
11+
- [Usage guide](https://apache-spark-on-k8s.github.io/userdocs/) shows how to run the code
1212
- [Development docs](resource-managers/kubernetes/README.md) shows how to get set up for development
1313
- Code is primarily located in the [resource-managers/kubernetes](resource-managers/kubernetes) folder
1414

@@ -30,113 +30,4 @@ This is a collaborative effort by several folks from different companies who are
3030
- Intel
3131
- Palantir
3232
- Pepperdata
33-
- Red Hat
34-
35-
--------------------
36-
37-
(original README below)
38-
39-
# Apache Spark
40-
41-
Spark is a fast and general cluster computing system for Big Data. It provides
42-
high-level APIs in Scala, Java, Python, and R, and an optimized engine that
43-
supports general computation graphs for data analysis. It also supports a
44-
rich set of higher-level tools including Spark SQL for SQL and DataFrames,
45-
MLlib for machine learning, GraphX for graph processing,
46-
and Spark Streaming for stream processing.
47-
48-
<http://spark.apache.org/>
49-
50-
51-
## Online Documentation
52-
53-
You can find the latest Spark documentation, including a programming
54-
guide, on the [project web page](http://spark.apache.org/documentation.html).
55-
This README file only contains basic setup instructions.
56-
57-
## Building Spark
58-
59-
Spark is built using [Apache Maven](http://maven.apache.org/).
60-
To build Spark and its example programs, run:
61-
62-
build/mvn -DskipTests clean package
63-
64-
(You do not need to do this if you downloaded a pre-built package.)
65-
66-
You can build Spark using more than one thread by using the -T option with Maven, see ["Parallel builds in Maven 3"](https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3).
67-
More detailed documentation is available from the project site, at
68-
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html).
69-
70-
For general development tips, including info on developing Spark using an IDE, see
71-
[http://spark.apache.org/developer-tools.html](the Useful Developer Tools page).
72-
73-
## Interactive Scala Shell
74-
75-
The easiest way to start using Spark is through the Scala shell:
76-
77-
./bin/spark-shell
78-
79-
Try the following command, which should return 1000:
80-
81-
scala> sc.parallelize(1 to 1000).count()
82-
83-
## Interactive Python Shell
84-
85-
Alternatively, if you prefer Python, you can use the Python shell:
86-
87-
./bin/pyspark
88-
89-
And run the following command, which should also return 1000:
90-
91-
>>> sc.parallelize(range(1000)).count()
92-
93-
## Example Programs
94-
95-
Spark also comes with several sample programs in the `examples` directory.
96-
To run one of them, use `./bin/run-example <class> [params]`. For example:
97-
98-
./bin/run-example SparkPi
99-
100-
will run the Pi example locally.
101-
102-
You can set the MASTER environment variable when running examples to submit
103-
examples to a cluster. This can be a mesos:// or spark:// URL,
104-
"yarn" to run on YARN, and "local" to run
105-
locally with one thread, or "local[N]" to run locally with N threads. You
106-
can also use an abbreviated class name if the class is in the `examples`
107-
package. For instance:
108-
109-
MASTER=spark://host:7077 ./bin/run-example SparkPi
110-
111-
Many of the example programs print usage help if no params are given.
112-
113-
## Running Tests
114-
115-
Testing first requires [building Spark](#building-spark). Once Spark is built, tests
116-
can be run using:
117-
118-
./dev/run-tests
119-
120-
Please see the guidance on how to
121-
[run tests for a module, or individual tests](http://spark.apache.org/developer-tools.html#individual-tests).
122-
123-
## A Note About Hadoop Versions
124-
125-
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported
126-
storage systems. Because the protocols have changed in different versions of
127-
Hadoop, you must build Spark against the same version that your cluster runs.
128-
129-
Please refer to the build documentation at
130-
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
131-
for detailed guidance on building for a particular distribution of Hadoop, including
132-
building for particular Hive and Hive Thriftserver distributions.
133-
134-
## Configuration
135-
136-
Please refer to the [Configuration Guide](http://spark.apache.org/docs/latest/configuration.html)
137-
in the online documentation for an overview on how to configure Spark.
138-
139-
## Contributing
140-
141-
Please review the [Contribution to Spark guide](http://spark.apache.org/contributing.html)
142-
for information on how to get started contributing to the project.
33+
- Red Hat

docs/running-on-kubernetes.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,8 +149,6 @@ environment variable in your Dockerfiles.
149149

150150
### Accessing Kubernetes Clusters
151151

152-
For details about running on public cloud environments, such as Google Container Engine (GKE), refer to [running Spark in the cloud with Kubernetes](running-on-kubernetes-cloud.md).
153-
154152
Spark-submit also supports submission through the
155153
[local kubectl proxy](https://kubernetes.io/docs/user-guide/accessing-the-cluster/#using-kubectl-proxy). One can use the
156154
authenticating proxy to communicate with the api server directly without passing credentials to spark-submit.

0 commit comments

Comments
 (0)