An important step towards achieving high code quality and maintainability in your Kedro project is the use of automated tests. Let's look at how you can set this up.
Software testing is the process of checking that the code you have written fulfills its requirements. Software testing can either be manual or automated. In the context of Kedro:
- Manual testing is when you run part or all of your project and check that the results are what you expect.
- Automated testing is writing new code (using libraries called testing frameworks) that runs part or all of your project and automatically checks the results against what you expect.
As a project grows larger, new code will increasingly rely on existing code. As these interdependencies grow, making changes in one part of the code base can unexpectedly break the intended functionality in another part.
The major disadvantage of manual testing is that it is time-consuming. Manual tests are usually run once, directly after new functionality has been added. It is impractical to repeat manual tests for the entire code base each time a change is made, which means this strategy often misses breaking changes.
The solution to this problem is automated testing. Automated testing allows many tests across the whole code base to be run in seconds, every time a new feature is added or an old one is changed. In this way, breaking changes can be discovered during development rather than in production.
There are many testing frameworks available for Python. One of the most popular is pytest
(see the project's home page for a quick overview). pytest
is often used in Python projects for its short, readable tests and powerful set of features.
Let's look at how you can start working with pytest
in your Kedro project.
Before getting started with pytest
, it is important to ensure you have installed your project locally. This allows you to test different parts of your project by importing them into your test files.
To install your project, navigate to your project root and run the following command:
pip install -e .
NOTE: The option
-e
installs an editable version of your project, allowing you to make changes to the project files without needing to re-install them each time.
Install pytest
as you would install other packages with pip
, making sure your project's virtual environment is active.
pip install pytest
Now that pytest
is installed, you will need a place to put your tests. Create a /tests
folder in the root directory of your project.
mkdir <project_root>/tests
The subdirectories in your project's /tests
directory should mirror the directory structure of your project's /src/<package_name>
directory. All files in the /tests
folder should be named test_<file_being_tested>.py
. See an example /tests
folder below.
src
│ ...
└───<package_name>
│ └───pipelines
│ └───dataprocessing
│ │ ...
│ │ nodes.py
│ │ ...
│
tests
└───pipelines
│ └───dataprocessing
│ │ ...
│ │ test_nodes.py
│ │ ...
Now that you have a place to put your tests, you can create an example test in the new file /src/tests/test_run.py
. The example test simply checks that the project_path attribute of a specially-defined KedroContext
object has been correctly set.
import pytest
from kedro.config import OmegaConfigLoader
from kedro.framework.context import KedroContext
from kedro.framework.hooks import _create_hook_manager
@pytest.fixture
def config_loader():
return OmegaConfigLoader(conf_source=str(Path.cwd()))
@pytest.fixture
def project_context(config_loader):
return KedroContext(
package_name=<package_name>,
project_path=Path.cwd(),
config_loader=config_loader,
hook_manager=_create_hook_manager(),
)
class TestProjectContext:
def test_project_path(self, project_context):
assert project_context.project_path == Path.cwd()
This test is redundant, but it introduces a few of pytest
's core features and demonstrates the layout of a test file:
- Fixtures are used to define resources used in tests.
- Tests are implemented in methods or functions beginning with
test_
and classes beginning withTest
. - The
assert
statement is used to compare the result of the test with an expected value.
Tests should be named as descriptively as possible, especially if you are working with other people. For example, it is easier to understand the purpose of a test with the name test_node_passes_with_valid_input
than a test with the name test_passes
.
You can read more about the basics of using pytest
on the getting started page. For help writing your own tests and using all of the features of pytest
, see the project documentation.
To run your tests, run pytest
from within your project's root directory.
cd <project_root>
pytest
If you created the example test in the previous section, you should see the following output in your shell.
============================= test session starts ==============================
...
collected 1 item
tests/test_run.py . [100%]
============================== 1 passed in 0.38s ===============================
This output indicates that one test ran successfully in the file src/tests/test_run.py
.
It can be useful to see how much of your project is covered by tests. For this, you can install and configure the pytest-cov
plugin for pytest
, which is based on the popular coverage.py
library.
Install pytest
as you would install other packages with pip, making sure your project's virtual environment is active.
pip install pytest-cov
To configure pytest
to generate a coverage report using pytest-cov
, you can add the following lines to your <project_root>/pyproject.toml
file (creating it if it does not exist).
[tool.pytest.ini_options]
addopts = """
--cov-report term-missing \
--cov src/<package_name> -ra"""
Running pytest
in the spaceflights starter with pytest-cov
installed results in the following additional report.
Name Stmts Miss Cover Missing
--------------------------------------------------------------------------------------
src/spaceflights/__init__.py 1 1 0% 4
src/spaceflights/__main__.py 30 30 0% 4-47
src/spaceflights/pipeline_registry.py 7 7 0% 2-16
src/spaceflights/pipelines/__init__.py 0 0 100%
src/spaceflights/pipelines/data_processing/__init__.py 1 1 0% 3
src/spaceflights/pipelines/data_processing/nodes.py 25 25 0% 1-67
src/spaceflights/pipelines/data_processing/pipeline.py 5 5 0% 1-8
src/spaceflights/pipelines/data_science/__init__.py 1 1 0% 3
src/spaceflights/pipelines/data_science/nodes.py 20 20 0% 1-55
src/spaceflights/pipelines/data_science/pipeline.py 8 8 0% 1-40
src/spaceflights/settings.py 0 0 100%
--------------------------------------------------------------------------------------
TOTAL 98 98 0%
This is the simplest report that coverage.py
(via pytest-cov
) will produce. It gives an overview of how many of the executable statements in each project file are covered by tests. For detail on the full set of features offered, see the coverage.py
docs.