Skip to content

Commit ddcd884

Browse files
HyukjinKwonJackey Lee
authored andcommitted
[SPARK-26252][PYTHON] Add support to run specific unittests and/or doctests in python/run-tests script
## What changes were proposed in this pull request? This PR proposes add a developer option, `--testnames`, to our testing script to allow run specific set of unittests and doctests. **1. Run unittests in the class** ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests' ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests'] Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (14s) Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (14s) ... 22 tests were skipped Tests passed in 14 seconds Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy: test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_fallback_enabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped ... ``` **2. Run single unittest in the class.** ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'] Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (0s) ... 1 tests were skipped Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (8s) Tests passed in 8 seconds Skipped tests in pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion with pypy: test_null_conversion (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' ``` **3. Run doctests in single PySpark module.** ```bash ./run-tests --testnames pyspark.sql.dataframe ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.dataframe'] Starting test(pypy): pyspark.sql.dataframe Starting test(python2.7): pyspark.sql.dataframe Finished test(python2.7): pyspark.sql.dataframe (47s) Finished test(pypy): pyspark.sql.dataframe (48s) Tests passed in 48 seconds ``` Of course, you can mix them: ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests,pyspark.sql.dataframe' ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python2.7', 'pypy'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests', 'pyspark.sql.dataframe'] Starting test(pypy): pyspark.sql.dataframe Starting test(pypy): pyspark.sql.tests.test_arrow ArrowTests Starting test(python2.7): pyspark.sql.dataframe Starting test(python2.7): pyspark.sql.tests.test_arrow ArrowTests Finished test(pypy): pyspark.sql.tests.test_arrow ArrowTests (0s) ... 22 tests were skipped Finished test(python2.7): pyspark.sql.tests.test_arrow ArrowTests (18s) Finished test(python2.7): pyspark.sql.dataframe (50s) Finished test(pypy): pyspark.sql.dataframe (52s) Tests passed in 52 seconds Skipped tests in pyspark.sql.tests.test_arrow ArrowTests with pypy: test_createDataFrame_column_name_encoding (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_does_not_modify_input (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' test_createDataFrame_fallback_disabled (pyspark.sql.tests.test_arrow.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.' ``` and also you can use all other options (except `--modules`, which will be ignored) ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion' --python-executables=python ``` ``` Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['python'] Will test the following Python tests: ['pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion'] Starting test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion Finished test(python): pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion (12s) Tests passed in 12 seconds ``` See help below: ```bash ./run-tests --help ``` ``` Usage: run-tests [options] Options: ... Developer Options: --testnames=TESTNAMES A comma-separated list of specific modules, classes and functions of doctest or unittest to test. For example, 'pyspark.sql.foo' to run the module as unittests or doctests, 'pyspark.sql.tests FooTests' to run the specific class of unittests, 'pyspark.sql.tests FooTests.test_foo' to run the specific unittest in the class. '--modules' option is ignored if they are given. ``` I intentionally grouped it as a developer option to be more conservative. ## How was this patch tested? Manually tested. Negative tests were also done. ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowTests.test_null_conversion1' --python-executables=python ``` ``` ... AttributeError: type object 'ArrowTests' has no attribute 'test_null_conversion1' ... ``` ```bash ./run-tests --testnames 'pyspark.sql.tests.test_arrow ArrowT' --python-executables=python ``` ``` ... AttributeError: 'module' object has no attribute 'ArrowT' ... ``` ```bash ./run-tests --testnames 'pyspark.sql.tests.test_ar' --python-executables=python ``` ``` ... /.../python2.7: No module named pyspark.sql.tests.test_ar ``` Closes apache#23203 from HyukjinKwon/SPARK-26252. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent db866ea commit ddcd884

File tree

2 files changed

+46
-24
lines changed

2 files changed

+46
-24
lines changed

python/run-tests-with-coverage

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,6 @@ export SPARK_CONF_DIR="$COVERAGE_DIR/conf"
5050
# This environment variable enables the coverage.
5151
export COVERAGE_PROCESS_START="$FWDIR/.coveragerc"
5252

53-
# If you'd like to run a specific unittest class, you could do such as
54-
# SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests
5553
./run-tests "$@"
5654

5755
# Don't run coverage for the coverage command itself

python/run-tests.py

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919

2020
from __future__ import print_function
2121
import logging
22-
from optparse import OptionParser
22+
from optparse import OptionParser, OptionGroup
2323
import os
2424
import re
2525
import shutil
@@ -99,7 +99,7 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
9999
try:
100100
per_test_output = tempfile.TemporaryFile()
101101
retcode = subprocess.Popen(
102-
[os.path.join(SPARK_HOME, "bin/pyspark"), test_name],
102+
[os.path.join(SPARK_HOME, "bin/pyspark")] + test_name.split(),
103103
stderr=per_test_output, stdout=per_test_output, env=env).wait()
104104
shutil.rmtree(tmp_dir, ignore_errors=True)
105105
except:
@@ -190,6 +190,20 @@ def parse_opts():
190190
help="Enable additional debug logging"
191191
)
192192

193+
group = OptionGroup(parser, "Developer Options")
194+
group.add_option(
195+
"--testnames", type="string",
196+
default=None,
197+
help=(
198+
"A comma-separated list of specific modules, classes and functions of doctest "
199+
"or unittest to test. "
200+
"For example, 'pyspark.sql.foo' to run the module as unittests or doctests, "
201+
"'pyspark.sql.tests FooTests' to run the specific class of unittests, "
202+
"'pyspark.sql.tests FooTests.test_foo' to run the specific unittest in the class. "
203+
"'--modules' option is ignored if they are given.")
204+
)
205+
parser.add_option_group(group)
206+
193207
(opts, args) = parser.parse_args()
194208
if args:
195209
parser.error("Unsupported arguments: %s" % ' '.join(args))
@@ -213,25 +227,31 @@ def _check_coverage(python_exec):
213227

214228
def main():
215229
opts = parse_opts()
216-
if (opts.verbose):
230+
if opts.verbose:
217231
log_level = logging.DEBUG
218232
else:
219233
log_level = logging.INFO
234+
should_test_modules = opts.testnames is None
220235
logging.basicConfig(stream=sys.stdout, level=log_level, format="%(message)s")
221236
LOGGER.info("Running PySpark tests. Output is in %s", LOG_FILE)
222237
if os.path.exists(LOG_FILE):
223238
os.remove(LOG_FILE)
224239
python_execs = opts.python_executables.split(',')
225-
modules_to_test = []
226-
for module_name in opts.modules.split(','):
227-
if module_name in python_modules:
228-
modules_to_test.append(python_modules[module_name])
229-
else:
230-
print("Error: unrecognized module '%s'. Supported modules: %s" %
231-
(module_name, ", ".join(python_modules)))
232-
sys.exit(-1)
233240
LOGGER.info("Will test against the following Python executables: %s", python_execs)
234-
LOGGER.info("Will test the following Python modules: %s", [x.name for x in modules_to_test])
241+
242+
if should_test_modules:
243+
modules_to_test = []
244+
for module_name in opts.modules.split(','):
245+
if module_name in python_modules:
246+
modules_to_test.append(python_modules[module_name])
247+
else:
248+
print("Error: unrecognized module '%s'. Supported modules: %s" %
249+
(module_name, ", ".join(python_modules)))
250+
sys.exit(-1)
251+
LOGGER.info("Will test the following Python modules: %s", [x.name for x in modules_to_test])
252+
else:
253+
testnames_to_test = opts.testnames.split(',')
254+
LOGGER.info("Will test the following Python tests: %s", testnames_to_test)
235255

236256
task_queue = Queue.PriorityQueue()
237257
for python_exec in python_execs:
@@ -246,16 +266,20 @@ def main():
246266
LOGGER.debug("%s python_implementation is %s", python_exec, python_implementation)
247267
LOGGER.debug("%s version is: %s", python_exec, subprocess_check_output(
248268
[python_exec, "--version"], stderr=subprocess.STDOUT, universal_newlines=True).strip())
249-
for module in modules_to_test:
250-
if python_implementation not in module.blacklisted_python_implementations:
251-
for test_goal in module.python_test_goals:
252-
heavy_tests = ['pyspark.streaming.tests', 'pyspark.mllib.tests',
253-
'pyspark.tests', 'pyspark.sql.tests', 'pyspark.ml.tests']
254-
if any(map(lambda prefix: test_goal.startswith(prefix), heavy_tests)):
255-
priority = 0
256-
else:
257-
priority = 100
258-
task_queue.put((priority, (python_exec, test_goal)))
269+
if should_test_modules:
270+
for module in modules_to_test:
271+
if python_implementation not in module.blacklisted_python_implementations:
272+
for test_goal in module.python_test_goals:
273+
heavy_tests = ['pyspark.streaming.tests', 'pyspark.mllib.tests',
274+
'pyspark.tests', 'pyspark.sql.tests', 'pyspark.ml.tests']
275+
if any(map(lambda prefix: test_goal.startswith(prefix), heavy_tests)):
276+
priority = 0
277+
else:
278+
priority = 100
279+
task_queue.put((priority, (python_exec, test_goal)))
280+
else:
281+
for test_goal in testnames_to_test:
282+
task_queue.put((0, (python_exec, test_goal)))
259283

260284
# Create the target directory before starting tasks to avoid races.
261285
target_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'target'))

0 commit comments

Comments
 (0)