Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tests into new categories reviewed by Eric Gamma #1989

Merged
merged 44 commits into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
5f18cae
Refactor data validation tests
miguelgfierro Sep 7, 2023
cd6cc35
Merge branch 'staging' into miguel/new_test_categories
miguelgfierro Sep 15, 2023
5457f50
Changed test_dataset to test_download_utils FYI @looklike
miguelgfierro Sep 15, 2023
1fc7b85
Performance tests
miguelgfierro Sep 15, 2023
2799676
Security tests
miguelgfierro Sep 15, 2023
8b61faf
Security tests
miguelgfierro Sep 15, 2023
bb66531
Regression tests
miguelgfierro Sep 15, 2023
7bd6bd7
Regression tests
miguelgfierro Sep 15, 2023
e483b62
Criteo responsible AI
miguelgfierro Sep 15, 2023
f985b66
Movielens responsible AI
miguelgfierro Sep 15, 2023
d6e5dbd
:bug:
miguelgfierro Sep 15, 2023
0ddd3f9
Forgot s
miguelgfierro Sep 15, 2023
46c6ffc
criteo
miguelgfierro Sep 16, 2023
5d3b1ab
criteo
miguelgfierro Sep 16, 2023
79de76f
mind
miguelgfierro Sep 16, 2023
abaa152
movielens
miguelgfierro Sep 16, 2023
da9dffc
movielens WIP
miguelgfierro Sep 16, 2023
51bca98
movielens
miguelgfierro Sep 16, 2023
a42cea0
:bug:
miguelgfierro Sep 16, 2023
c243389
integration to functional
miguelgfierro Sep 16, 2023
6b01105
integration to functional
miguelgfierro Sep 16, 2023
64929e7
functional CPU
miguelgfierro Sep 16, 2023
106aee1
functional GPU and Spark
miguelgfierro Sep 16, 2023
1eb3313
Integration
miguelgfierro Sep 16, 2023
b9b2a21
Reviewing smoke
miguelgfierro Sep 16, 2023
1a8ab4a
Reviewing smoke
miguelgfierro Sep 16, 2023
3fc4e0d
unit tests notebooks
miguelgfierro Sep 16, 2023
61868d2
unit tests dataset
miguelgfierro Sep 16, 2023
dfa0c55
unit python evaluation
miguelgfierro Sep 16, 2023
9498642
unit pyspark evaluation
miguelgfierro Sep 16, 2023
0eea4d8
Added en extra s
miguelgfierro Sep 17, 2023
224e15b
unit models WIP
miguelgfierro Sep 17, 2023
34665b0
unit models WIP
miguelgfierro Sep 17, 2023
c26a6a3
unit models
miguelgfierro Sep 17, 2023
66e6dca
unit tuning
miguelgfierro Sep 17, 2023
3cdb4d3
unit utils
miguelgfierro Sep 17, 2023
0fbbec1
:memo:
miguelgfierro Sep 17, 2023
c40ffce
:bug:
miguelgfierro Sep 17, 2023
de11824
:bug:
miguelgfierro Sep 17, 2023
e26f4ff
:bug:
miguelgfierro Sep 17, 2023
fdee579
Update readme tests
miguelgfierro Sep 17, 2023
54b171b
License file changed so the maybe download tests had to be updated
miguelgfierro Sep 17, 2023
1c1e1e4
:memo:
miguelgfierro Sep 17, 2023
0d17767
ignoring one of the lightfm for a weird error
miguelgfierro Sep 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/get-test-groups/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ runs:
if [[ ${{ inputs.TEST_KIND }} == "nightly" ]]; then
test_groups_str=$(python -c 'from tests.ci.azureml_tests.test_groups import nightly_test_groups; print([t for t in nightly_test_groups.keys() if "${{inputs.TEST_ENV}}" in t])')
else
test_groups_str=$(python -c 'from tests.ci.azureml_tests.test_groups import unit_test_groups; print(list(unit_test_groups.keys()))')
test_groups_str=$(python -c 'from tests.ci.azureml_tests.test_groups import pr_gate_test_groups; print(list(pr_gate_test_groups.keys()))')
fi
echo "test_groups=$test_groups_str" >> $GITHUB_OUTPUT
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,13 +135,13 @@ This project adheres to [Microsoft's Open Source Code of Conduct](CODE_OF_CONDUC

## Build Status

These tests are the nightly builds, which compute the smoke and integration tests. `main` is our principal branch and `staging` is our development branch. We use [pytest](https://docs.pytest.org/) for testing python utilities in [recommenders](recommenders) and [Papermill](https://github.com/nteract/papermill) and [Scrapbook](https://nteract-scrapbook.readthedocs.io/en/latest/) for the [notebooks](examples).
These tests are the nightly builds, which compute the asynchronous tests. `main` is our principal branch and `staging` is our development branch. We use [pytest](https://docs.pytest.org/) for testing python utilities in [recommenders](recommenders) and [Papermill](https://github.com/nteract/papermill) and [Scrapbook](https://nteract-scrapbook.readthedocs.io/en/latest/) for the [notebooks](examples).

For more information about the testing pipelines, please see the [test documentation](tests/README.md).

### AzureML Nightly Build Status

Smoke and integration tests are run daily on AzureML.
The nightly build tests are run daily on AzureML.

| Build Type | Branch | Status | | Branch | Status |
| --- | --- | --- | --- | --- | --- |
Expand Down
4 changes: 2 additions & 2 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,9 @@ First make sure that the tag that you want to add, e.g. `0.6.0`, is added in [`r
1. Make sure that the code in main passes all the tests (unit and nightly tests).
1. Create a tag with the version number: e.g. `git tag -a 0.6.0 -m "Recommenders 0.6.0"`.
1. Push the tag to the remote server: `git push origin 0.6.0`.
1. When the new tag is pushed, a release pipeline is executed. This pipeline runs all the tests again (unit, smoke and integration), generates a wheel and a tar.gz which are uploaded to a [GitHub draft release](https://github.com/microsoft/recommenders/releases).
1. When the new tag is pushed, a release pipeline is executed. This pipeline runs all the tests again (PR gate and nightly builds), generates a wheel and a tar.gz which are uploaded to a [GitHub draft release](https://github.com/microsoft/recommenders/releases).
1. Fill up the draft release with all the recent changes in the code.
1. Download the wheel and tar.gz locally, these files shouldn't have any bug, since they passed all the tests.
1. Install twine: `pip install twine`
1. Publish the wheel and tar.gz to pypi: `twine upload recommenders*`
1. Publish the wheel and tar.gz to PyPI: `twine upload recommenders*`

3 changes: 0 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,7 @@ build-backend = "setuptools.build_meta"
[tool.pytest.ini_options]
markers = [
"experimental: tests that will not be executed and may need extra dependencies",
"flaky: flaky tests that can fail unexpectedly",
"gpu: tests running on GPU",
"integration: integration tests",
"notebooks: tests for notebooks",
"smoke: smoke tests",
"spark: tests that requires Spark",
]
8 changes: 4 additions & 4 deletions recommenders/utils/gpu_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def get_cuda_version():
data = f.read().replace("\n", "")
return data
else:
return "Cannot find CUDA in this machine"
return None


def get_cudnn_version():
Expand Down Expand Up @@ -125,14 +125,14 @@ def find_cudnn_in_headers(candiates):
if version:
return version
else:
return "Cannot find CUDNN version"
return None
else:
return "Cannot find CUDNN version"
return None

try:
import torch

return torch.backends.cudnn.version()
return str(torch.backends.cudnn.version())
except (ImportError, ModuleNotFoundError):
if sys.platform == "win32":
candidates = [r"C:\NVIDIA\cuda\include\cudnn.h"]
Expand Down
5 changes: 3 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"transformers>=2.5.0,<5",
"category_encoders>=1.3.0,<2",
"jinja2>=2,<3.1",
"requests>=2.0.0,<3",
"requests>=2.31.0,<3",
"cornac>=1.1.2,<1.15.2;python_version<='3.7'",
"cornac>=1.15.2,<2;python_version>='3.8'", # After 1.15.2, Cornac requires python 3.8
"retrying>=1.3.3",
Expand All @@ -64,7 +64,7 @@
"tensorflow~=2.6.1;python_version=='3.6'",
"tensorflow~=2.7.0;python_version>='3.7'",
"tf-slim>=1.1.0",
"torch>=1.8", # for CUDA 11 support
anargyri marked this conversation as resolved.
Show resolved Hide resolved
"torch>=1.13.1", # for CUDA 11 support
"fastai>=1.0.46,<2",
],
"spark": [
Expand All @@ -89,6 +89,7 @@
"vowpalwabbit>=8.9.0,<9",
# nni needs to be upgraded
"nni==1.5",
"pymanopt>=0.2.5",
]

# The following dependency can be installed as below, however PyPI does not allow direct URLs.
Expand Down
143 changes: 28 additions & 115 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ In this section we show how to create tests and add them to the test pipeline. T
1. Create your code in the library and/or notebooks.
1. Design the unit tests for the code.
1. If you have written a notebook, design the notebook tests and check that the metrics they return is what you expect.
1. Add the tests to the AzureML pipeline in the corresponding [test group](./ci/azureml_tests/test_groups.py). **Please note that if you don't add your tests to the pipeline, they will not be executed.**
1. Add the tests to the AzureML pipeline in the corresponding [test group](./ci/azureml_tests/test_groups.py).

**Please note that if you don't add your tests to the pipeline, they will not be executed.**

### How to create tests for the Recommenders library

Expand All @@ -74,8 +76,6 @@ You want to make sure that all your code works before you submit it to the repos
* Use the mark `@pytest.mark.gpu` if you want the test to be executed
in a GPU environment. Use `@pytest.mark.spark` if you want the test
to be executed in a Spark environment.
* Use `@pytest.mark.smoke` and `@pytest.mark.integration` to mark the
tests as smoke tests and integration tests.
* Use `@pytest.mark.notebooks` if you are testing a notebook.
* Avoid using `is` in the asserts, instead use the operator `==`.
* Follow the pattern `assert computation == value`, for example:
Expand Down Expand Up @@ -113,7 +113,7 @@ For executing this test, first make sure you are in the correct environment as d
*Notice that the next instruction executes the tests from the root folder.*

```bash
pytest tests/unit/test_notebooks_python.py::test_sar_single_node_runs
pytest tests/unit/examples/test_notebooks_python.py::test_sar_single_node_runs
```

#### Developing nightly tests with Papermill and Scrapbook
Expand All @@ -124,7 +124,7 @@ The first step is to tag the parameters that we are going to inject. For it we n

The way papermill works to inject parameters is very simple, it generates a copy of the notebook (in our code we call it `OUTPUT_NOTEBOOK`), and creates a new cell with the injected variables.

The second modification that we need to do to the notebook is to record the metrics we want to test using `sb.glue("output_variable", python_variable_name)`. We normally use the last cell of the notebook to record all the metrics. These are the metrics that we are going to control in the smoke and integration tests.
The second modification that we need to do to the notebook is to record the metrics we want to test using `sb.glue("output_variable", python_variable_name)`. We normally use the last cell of the notebook to record all the metrics. These are the metrics that we are going to control in the smoke and functional tests.

This is an example on how we do a smoke test. The complete code can be found in [smoke/examples/test_notebooks_python.py](./smoke/examples/test_notebooks_python.py):

Expand All @@ -136,7 +136,6 @@ import scrapbook as sb
TOL = 0.05
ABS_TOL = 0.05

@pytest.mark.smoke
def test_sar_single_node_smoke(notebooks, output_notebook, kernel_name):
notebook_path = notebooks["sar_single_node"]
pm.execute_notebook(
Expand All @@ -159,14 +158,14 @@ For executing this test, first make sure you are in the correct environment as d
*Notice that the next instructions execute the tests from the root folder.*

```
pytest tests/smoke/test_notebooks_python.py::test_sar_single_node_smoke
pytest tests/smoke/examples/test_notebooks_python.py::test_sar_single_node_smoke
```

More details on how to integrate Papermill with notebooks can be found in their [repo](https://github.com/nteract/papermill). Also, you can check the [Scrapbook repo](https://github.com/nteract/scrapbook).

### How to add tests to the AzureML pipeline

To add a new test to the AzureML pipeline, add the test path to an appropriate test group listed in [test_groups.py](https://github.com/microsoft/recommenders/blob/main/tests/ci/azureml_tests/test_groups.py).
To add a new test to the AzureML pipeline, add the test path to an appropriate test group listed in [test_groups.py](./ci/azureml_tests/test_groups.py).

Tests in `group_cpu_xxx` groups are executed on a CPU-only AzureML compute cluster node. Tests in `group_gpu_xxx` groups are executed on a GPU-enabled AzureML compute cluster node with GPU related dependencies added to the AzureML run environment. Tests in `group_pyspark_xxx` groups are executed on a CPU-only AzureML compute cluster node, with the PySpark related dependencies added to the AzureML run environment.

Expand All @@ -177,15 +176,13 @@ Example of adding a new test:
1. In the environment that you are running your code, first see if there is a group whose total runtime is less than the threshold.
```python
"group_spark_001": [ # Total group time: 271.13s
"tests/smoke/recommenders/dataset/test_movielens.py::test_load_spark_df", # 4.33s
"tests/integration/recommenders/datasets/test_movielens.py::test_load_spark_df", # 25.58s + 101.99s + 139.23s
"tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df", # 4.33s+ 25.58s + 101.99s + 139.23s
],
```
2. Add the test to the group, add the time it takes to compute, and update the total group time.
```python
"group_spark_001": [ # Total group time: 571.13s
"tests/smoke/recommenders/dataset/test_movielens.py::test_load_spark_df", # 4.33s
"tests/integration/recommenders/datasets/test_movielens.py::test_load_spark_df", # 25.58s + 101.99s + 139.23s
"tests/data_validation/recommenders/datasets/test_movielens.py::test_load_spark_df", # 4.33s+ 25.58s + 101.99s + 139.23s
#
"tests/path/to/test_new.py::test_new_function", # 300s
],
Expand Down Expand Up @@ -217,133 +214,50 @@ Then, follow the steps below to create the AzureML infrastructure:

To manually execute the tests in the CPU, GPU or Spark environments, first **make sure you are in the correct environment as described in the [SETUP.md](../SETUP.md)**.

*Click on the following menus* to see more details on how to execute the unit, smoke and integration tests:

<details>
<summary><strong><em>Unit tests</em></strong></summary>

Unit tests ensure that each class or function behaves as it should. Every time a developer makes a pull request to staging or main branch, a battery of unit tests is executed.

*Note that the next instructions execute the tests from the root folder.*

For executing the Python unit tests for the utilities:

pytest tests/unit -m "not notebooks and not spark and not gpu" --durations 0

For executing the Python unit tests for the notebooks:

pytest tests/unit -m "notebooks and not spark and not gpu" --durations 0

For executing the Python GPU unit tests for the utilities:

pytest tests/unit -m "not notebooks and not spark and gpu" --durations 0

For executing the Python GPU unit tests for the notebooks:

pytest tests/unit -m "notebooks and not spark and gpu" --durations 0

For executing the PySpark unit tests for the utilities:

pytest tests/unit -m "not notebooks and spark and not gpu" --durations 0

For executing the PySpark unit tests for the notebooks:

pytest tests/unit -m "notebooks and spark and not gpu" --durations 0

*NOTE: Adding `--durations 0` shows the computation time of all tests.*

*NOTE: Adding `--disable-warnings` will disable the warning messages.*

</details>

<details>
<summary><strong><em>Smoke tests</em></strong></summary>

Smoke tests make sure that the system works and are executed just before the integration tests every night.
### CPU tests

*Note that the next instructions execute the tests from the root folder.*

For executing the Python smoke tests:
For executing the CPU tests for the utilities:

pytest tests/smoke -m "smoke and not spark and not gpu" --durations 0
pytest tests -m "not notebooks and not spark and not gpu" --durations 0 --disable-warnings

For executing the Python GPU smoke tests:
For executing the CPU tests for the notebooks:

pytest tests/smoke -m "smoke and not spark and gpu" --durations 0
pytest tests -m "notebooks and not spark and not gpu" --durations 0 --disable-warnings

For executing the PySpark smoke tests:
If you want to execute a specific test, you can use the following command:

pytest tests/smoke -m "smoke and spark and not gpu" --durations 0
pytest tests/data_validation/recommenders/datasets/test_mind.py::test_mind_url --durations 0 --disable-warnings

*NOTE: Adding `--durations 0` shows the computation time of all tests.*
If you want to execute any of the tests types (data_validation, unit, smoke, functional, etc.) you can use the following command:

*NOTE: Adding `--disable-warnings` will disable the warning messages.*
pytest tests/data_validation -m "not notebooks and not spark and not gpu" --durations 0 --disable-warnings

</details>
### GPU tests

<details>
<summary><strong><em>Integration tests</em></strong></summary>
For executing the GPU tests for the utilities:

Integration tests make sure that the program results are acceptable.
pytest tests -m "not notebooks and not spark and gpu" --durations 0 --disable-warnings

*Note that the next instructions execute the tests from the root folder.*
For executing the GPU tests for the notebooks:

For executing the Python integration tests:
pytest tests -m "notebooks and not spark and gpu" --durations 0 --disable-warnings

pytest tests/integration -m "integration and not spark and not gpu" --durations 0
### Spark tests

For executing the Python GPU integration tests:
For executing the PySpark tests for the utilities:

pytest tests/integration -m "integration and not spark and gpu" --durations 0
pytest tests -m "not notebooks and spark and not gpu" --durations 0 --disable-warnings

For executing the PySpark integration tests:
For executing the PySpark tests for the notebooks:

pytest tests/integration -m "integration and spark and not gpu" --durations 0
pytest tests -m "notebooks and spark and not gpu" --durations 0 --disable-warnings

*NOTE: Adding `--durations 0` shows the computation time of all tests.*

*NOTE: Adding `--disable-warnings` will disable the warning messages.*

</details>

<details>
<summary><strong><em>Current Skipped Tests</em></strong></summary>

Several of the tests are skipped for various reasons which are noted below.

<table>
<tr>
<td>Test Module</td>
<td>Test</td>
<td>Test Environment</td>
<td>Reason</td>
</tr>
<tr>
<td>unit/recommenders/datasets/test_wikidata</td>
<td>*</td>
<td>Linux</td>
<td>Wikidata API is unstable</td>
</tr>
<tr>
<td>integration/recommenders/datasets/test_notebooks_python</td>
<td>test_wikidata</td>
<td>Linux</td>
<td>Wikidata API is unstable</td>
</tr>
<tr>
<td>*/test_notebooks_python</td>
<td>test_vw*</td>
<td>Linux</td>
<td>VW pip package has installation incompatibilities</td>
</tr>
<tr>
<td>*/test_notebooks_python</td>
<td>test_nni*</td>
<td>Linux</td>
<td>NNI pip package has installation incompatibilities</td>
</tr>
</table>

In order to skip a test because there is an OS or upstream issue which cannot be resolved you can use pytest [annotations](https://docs.pytest.org/en/latest/skipping.html).

Example:
Expand All @@ -353,4 +267,3 @@ Example:
def test_to_skip():
assert False

</details>
4 changes: 2 additions & 2 deletions tests/ci/azureml_tests/run_groupwise_pytest.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import argparse
import glob
from azureml.core import Run
from test_groups import nightly_test_groups, unit_test_groups
from test_groups import nightly_test_groups, pr_gate_test_groups

if __name__ == "__main__":

Expand Down Expand Up @@ -46,7 +46,7 @@
if args.testkind == "nightly":
test_group = nightly_test_groups[args.testgroup]
else:
test_group = unit_test_groups[args.testgroup]
test_group = pr_gate_test_groups[args.testgroup]

logger.info("Tests to be executed")
logger.info(str(test_group))
Expand Down
Loading