Skip to content

Commit 033a33e

Browse files
committed
Address nits and fix documentations to be more clear
1 parent 83815e0 commit 033a33e

File tree

4 files changed

+13
-8
lines changed

4 files changed

+13
-8
lines changed

python/docs/source/getting_started/installation.rst

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,12 @@ For PySpark with different Hadoop and/or Hive, you can install it by using ``HIV
4848
HIVE_VERSION=1.2 HADOOP_VERSION=2.7 pip install pyspark
4949
5050
The default distribution has built-in Hadoop 3.2 and Hive 2.3. If users specify different versions, the pip installation automatically
51-
downloads a different version and use it in PySpark.
51+
downloads a different version and use it in PySpark. Downloading it can take a while depending on the network and the mirror chosen.
52+
It is recommended to use `-v` option in `pip` to track the installation and download status.
53+
54+
.. code-block:: bash
55+
56+
HADOOP_VERSION=2.7 pip install pyspark -v
5257
5358
Supported versions are as below:
5459

@@ -134,4 +139,4 @@ Package Minimum supported version Note
134139
============= ========================= ================
135140

136141
**Note**: PySpark requires Java 8 or later with ``JAVA_HOME`` properly set.
137-
If using JDK 11, set ``-Dio.netty.tryReflectionSetAccessible=true`` for Arrow related features and refer to `Downloading <https://spark.apache.org/docs/latest/#downloading>`_
142+
If using JDK 11, set ``-Dio.netty.tryReflectionSetAccessible=true`` for Arrow related features and refer to `Downloading <https://spark.apache.org/docs/latest/#downloading>`_

python/pyspark/find_spark_home.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def is_spark_home(path):
4343
from importlib.util import find_spec
4444
try:
4545
# Spark distribution can be downloaded when HADOOP_VERSION environment variable is set.
46-
# We should look up this directory first.
46+
# We should look up this directory first, see also SPARK-32017.
4747
spark_dist_dir = "spark-distribution"
4848
module_home = os.path.dirname(find_spec("pyspark").origin)
4949
paths.append(os.path.join(module_home, spark_dist_dir))

python/pyspark/tests/test_install_spark.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,11 @@
2525
class SparkInstallationTestCase(unittest.TestCase):
2626

2727
def test_install_spark(self):
28-
# Just pick one combination to test.
28+
# Test only one case. Testing this is expensive because it needs to download
29+
# the Spark distribution.
2930
spark_version, hadoop_version, hive_version = checked_versions("3.0.1", "3.2", "2.3")
3031

3132
with tempfile.TemporaryDirectory() as tmp_dir:
32-
# Test only default case. Testing this is expensive because it needs to download
33-
# the Spark distribution.
3433
install_spark(
3534
dest=tmp_dir,
3635
spark_version=spark_version,

python/setup.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,8 +126,9 @@ def run(self):
126126
os.environ.get("HADOOP_VERSION", install_module.DEFAULT_HADOOP).lower(),
127127
os.environ.get("HIVE_VERSION", install_module.DEFAULT_HIVE).lower())
128128

129-
if ((install_module.DEFAULT_HADOOP, install_module.DEFAULT_HIVE) ==
130-
(hadoop_version, hive_version)):
129+
if ("SPARK_VERSION" not in os.environ and
130+
((install_module.DEFAULT_HADOOP, install_module.DEFAULT_HIVE) ==
131+
(hadoop_version, hive_version))):
131132
# Do not download and install if they are same as default.
132133
return
133134

0 commit comments

Comments
 (0)