Skip to content

Updates according to discussion in issues [closes #13, #14, #17, #19, #20, #21, #23, #25, #26, #28] #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CITATION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Please cite as:

Ed Bennett, Lester Hedges, Matt Williams, "Introduction to automated testing and continuous integration in Python"
Ed Bennett, Lester Hedges, Julian Lenz, Matt Williams, "Introduction to automated testing and continuous integration in Python"
69 changes: 62 additions & 7 deletions _episodes/02-pytest-functionality.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Lets add a second test to check a different set of inputs and outputs to the
~~~
from arrays import add_arrays

def test_add_arrays1():
def test_add_arrays_positive():
a = [1, 2, 3]
b = [4, 5, 6]
expect = [5, 7, 9]
Expand All @@ -45,7 +45,7 @@ def test_add_arrays1():

assert output == expect

def test_add_arrays2():
def test_add_arrays_negative():
a = [-1, -5, -3]
b = [-4, -3, 0]
expect = [-5, -8, -3]
Expand Down Expand Up @@ -73,8 +73,8 @@ rootdir: /home/matt/projects/courses/software_engineering_best_practices
plugins: requests-mock-1.8.0
collected 2 items

test_arrays.py::test_add_arrays1 PASSED [ 50%]
test_arrays.py::test_add_arrays2 PASSED [100%]
test_arrays.py::test_add_arrays_positive PASSED [ 50%]
test_arrays.py::test_add_arrays_negative PASSED [100%]

==================== 2 passed in 0.07s =====================
~~~
Expand Down Expand Up @@ -166,6 +166,57 @@ test_arrays.py::test_add_arrays[a1-b1-expect1] PASSED [100%]

We see that both tests have the same name (`test_arrays.py::test_add_arrays`)
but each parametrization is differentiated with some square brackets.
Unfortunately, in the current form this differentiation is not very helpful. If
you run this test later, you might not remember what `a0-b0-expect0` means, let
alone the precise numbers or the motivation for choosing them. Were that the
positive inputs or the negative ones? Did I choose them after fixing a
particular bug or because it is an important use case or were those just random
numbers?

Luckily, we are not the first ones to realise that the above form of
parametrization misses the expressiveness of explicit function names. That's why
there is an additional `ids` keyword argument: The following code

~~~
import pytest

from arrays import add_arrays

@pytest.mark.parametrize("a, b, expect", [
([1, 2, 3], [4, 5, 6], [5, 7, 9]),
([-1, -5, -3], [-4, -3, 0], [-5, -8, -3]),
ids=['positve','negative']
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • comma at eol
  • space after comma
  • use double quotes rather than single (or at least be consistent)

(Lots of other instances of each; I won't tag all of them)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, haven't run it through black admittedly. It is just a great illustration of how annoying the absence of autoformatting is.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spotted the word positve here too.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the square bracket started on line 185 should be closed before line 188.

])
def test_add_arrays(a, b, expect):
output = add_arrays(a, b)

assert output == expect
~~~
{: .language-python}

now results in the significantly more expressive

~~~
=================== test session starts ====================
platform linux -- Python 3.8.5, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /home/matt/projects/courses/software_engineering_best_practices
plugins: requests-mock-1.8.0
collected 2 items

test_arrays.py::test_add_arrays[positive] PASSED [ 50%]
test_arrays.py::test_add_arrays[negative] PASSED [100%]

==================== 2 passed in 0.03s =====================
~~~
{: .output}

If the arguments are better representable as a string than our example with
lists here, `pytest` often does a reasonably good job in generating `ids`
automatically from the values (we will see some examples of this in the next
section). But this still lacks the intentional communication that is associated
with manually chosen `ids`, so we strongly recommend to use `ids` in all but the
most trivial cases.

> ## More parameters
>
Expand All @@ -185,6 +236,7 @@ but each parametrization is differentiated with some square brackets.
>> ([-1, -5, -3], [-4, -3, 0], [-5, -8, -3]), # Test zeros
>> ([41, 0, 3], [4, 76, 32], [45, 76, 35]), # Test larger numbers
>> ([], [], []), # Test empty lists
>> ids=["positive", "negative", "larger numbers", "empty lists"]
>> ])
>> def test_add_arrays(a, b, expect):
>> output = add_arrays(a, b)
Expand All @@ -195,7 +247,6 @@ but each parametrization is differentiated with some square brackets.
> {: .solution}
{: .challenge}


## Failing correctly

The interface of a function is made up of the _parameters_ it expects and the
Expand Down Expand Up @@ -280,6 +331,7 @@ from arrays import add_arrays
@pytest.mark.parametrize("a, b, expect", [
([1, 2, 3], [4, 5, 6], [5, 7, 9]),
([-1, -5, -3], [-4, -3, 0], [-5, -8, -3]),
ids=["positive", "negative"]
])
def test_add_arrays(a, b, expect):
output = add_arrays(a, b)
Expand Down Expand Up @@ -307,8 +359,8 @@ rootdir: /home/matt/projects/courses/software_engineering_best_practices
plugins: requests-mock-1.8.0
collected 3 items

test_arrays.py::test_add_arrays[a0-b0-expect0] PASSED [ 33%]
test_arrays.py::test_add_arrays[a1-b1-expect1] PASSED [ 66%]
test_arrays.py::test_add_arrays[positive] PASSED [ 33%]
test_arrays.py::test_add_arrays[negative] PASSED [ 66%]
test_arrays.py::test_add_arrays_error PASSED [100%]

==================== 3 passed in 0.03s =====================
Expand All @@ -326,6 +378,7 @@ test_arrays.py::test_add_arrays_error PASSED [100%]
>> @pytest.mark.parametrize("a, b, expected_error", [
>> ([1, 2, 3], [4, 5], ValueError),
>> ([1, 2], [4, 5, 6], ValueError),
>> ids=['second shorter','first shorter']
>> ])
>> def test_add_arrays_error(a, b, expected_error):
>> with pytest.raises(expected_error):
Expand Down Expand Up @@ -354,6 +407,7 @@ test_arrays.py::test_add_arrays_error PASSED [100%]
>> ([6], [3], [2]), # Test single-element lists
>> ([1, 2, 3], [4, 5, 6], [0.25, 0.4, 0.5]), # Test non-integers
>> ([], [], []), # Test empty lists
>> ids=["int", "negative int", "single-element", "non-int", "empty lists"]
>> ])
>> def test_divide_arrays(a, b, expect):
>> output = divide_arrays(a, b)
Expand All @@ -365,6 +419,7 @@ test_arrays.py::test_add_arrays_error PASSED [100%]
>> ([1, 2, 3], [4, 5], ValueError),
>> ([1, 2], [4, 5, 6], ValueError),
>> ([1, 2, 3], [0, 1, 2], ZeroDivisionError),
>> ids=['second shorter', 'first shorter', 'zero division']
>> ])
>> def test_divide_arrays_error(a, b, expected_error):
>> with pytest.raises(expected_error):
Expand Down
77 changes: 76 additions & 1 deletion _episodes/03-fixtures.md
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,8 @@ being done once.
> behaviour of the tests, and pytest prioritises correctness of the tests over
> their performance.
>
> What sort of behavior would functions have that failed in this way?
> What sort of behavior would functions have that failed in this way? Can you
> come up with example code for this?
>
>> ## Solution
>>
Expand All @@ -425,6 +426,80 @@ being done once.
>>
>> Fixtures should only be re-used within groups of tests that do not mutate
>> them.
>>
>> ~~~
>> @pytest.fixture(scope="session")
>> def initially_empty_list():
>> return []
>>
>>
>> @pytest.mark.parametrize("letter", ["a", "b", "c"])
>> def test_append_letter(initially_empty_list, letter):
>> initially_empty_list.append(letter)
>> assert initially_empty_list == [letter]
>> ~~~
>> {:. language-python}
> {: .solution}
{: .challenge}

> ## Better ways to (unit) test
>
> The above example was explicitly constructed to acquire an expensive resource
> and exhibit a big advantage when using a fixture but is it actually a good
> way to test the `word_counts` function? Think about what the `word_counts` is
> supposed to do. Do you need a whole book to test this?
>
> List advantages and disadvantages of the above approach. Then, come up with
> another way of testing it that cures the disadvantages (maybe also loosing
> some of the advantages). Is your approach simpler and less error-prone?
>
> It is safe to assume that whenever to test such a function, it is supposed to
> be used in a larger project. Can you think of a test scenario where the
> original method is the best?
>
>> ## Solution
>>
>> The `word_counts` function is designed to count words in any string. It does
>> not need a whole book to test counting, so we could have also used tiny test
>> strings like `""`, `"hello world"`, `"hello, hello world"` to test all
>> functionality of `word_counts`. In fact, the original approach has a number
>> of disadvantages:
>>
>> * It is (time) expensive because it needs to download the book every time the
>> test suite is run. (2s for a test is a very long time if you want to run
>> that a test suite of hundreds of those every few minutes.)
>> * It is brittle regarding various aspects:
>> - If you don't have an internet connection, your test fails.
>> - If the URL changes, your test fails.
>> - If the content changes, your test fails (we had that a few times).
>> * It is very obscure because you cannot know if the numbers we have given you
>> are correct. Maybe the function has a bug that we don't know about because
>> admittedly we also just used the output of that function to generate our
>> test cases.
>>
>> The one big advantage of the above is that you are using realistic test data.
>> As opposed to the string `"hello world"`, the book likely contains a lot of
>> different words, potentially different capitalisation and spellings,
>> additional punctuation and maybe special characters that your function may or
>> may not handle correctly. You might need a lot of different test strings to
>> cover all these cases (and combinations thereof).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things I might make explicit here:

  • Trying to write lots of tiny test strings by hand that cover every possible eventuality might still not cover everything that a real book did. (You could of course correctly argue that you could add tests for such eventualities as specific unit tests as you found them, but that doesn't mean that you shouldn't also test with some real data.)
  • If you want to test with real data, relying on an external source to get them may not be the best option, for the reasons you discuss. You can include data files to use as fixtures instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I made the first point already but I will try to be more explicit about that.

>>
>> The alternative approach with tiny test strings cures all of the above
>> listed disadvantages and the tests will be easy to read, understand and
>> verify particularly if you use expressive test function names and parameters
>> `ids`. This is the best way to write a unit test, i.e. a test that is
>> concerned with this single unit of functionality in isolation and will likely
>> be run hundreds of times during a coding session.
>>
>> Nevertheless, in a bigger project you would want to have other kinds of
>> tests, too. The `word_counts` functionality will probably be integrated into
>> a larger aspect of functionality, e.g., a statistical analysis of books. In
>> such a case, it is equally important to test that the integration of the
>> various individually tested units worked correctly. Such integration tests
>> will be run less often than unit tests and might be more meaningful for more
>> realistic circumstances. For such -- and definitely for the even broader
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown doesn't automatically turn -- into , so either use or –

>> end-to-end tests that run a whole program from the (simulated) user input to
>> a final output -- the original approach is well-suited.
> {: .solution}
{: .challenge}

Expand Down
6 changes: 3 additions & 3 deletions _episodes/04-edges.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ def test_left_edge():
assert c.neighbours() == 3

# Check the coordinates of the neighbours.
assert c.left() == None
assert c.left() is None
assert c.right() == (1, 2)
assert c.up() == (0, 3)
assert c.down() == (0, 1)
Expand Down Expand Up @@ -146,10 +146,10 @@ def test_bottom_left_corner():
assert c.neighbours() == 2

# Check the coordinates of the neighbours.
assert c.left() == None
assert c.left() is None
assert c.right() == (1, 0)
assert c.up() == (0, 1)
assert c.down() == None
assert c.down() is None
~~~
{: .language-python}

Expand Down
13 changes: 8 additions & 5 deletions _episodes/05-randomness.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,11 +173,14 @@ that it is relatively bug-free for the cases we've tested for. Of course, so far
we've only tested 6-sided dice—we have no guarantee that it works for
other numbers of sides, yet.

You can extend this approach to any programming problem where you don't know the
exact answer up front, including those that are random and those that are just
exploratory. Start by focusing on what you do know, and write tests for that. As
you understand more what the expected results are, you can expand the test
suite.
The important upshot of this approach is that despite the fact that we could not
predict the exact return value of our function, we were still able to test for
exactly known invariants and guarantees upheld by it. You can extend this
approach to any programming problem where the exact return value of a function
cannot be meaningfully tested for, including those that are random or out of
your control and those that are just exploratory. Start by focusing on what you
do know, and write tests for that. As you understand more what the expected
results are, you can expand the test suite.

> ## Two six-sided dice
>
Expand Down
18 changes: 18 additions & 0 deletions _episodes/06-continuous-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,23 @@ next to the first commit, a green tick (passed) next to the second, and nothing
> check all code against a defined house style (for example, PEP 8).
{: .callout}

> ## pre-commit
>
> Another helpful developer tool somewhat related to CI is
> [pre-commit][pre-commit] (or more generally `git` hooks). They allow to
> perform certain actions locally when triggered by various `git` related events
> like before or after a commit, merge, push, etc. A standard use-case is
> running automated formatters or code linters before every commit/push but
> other things are possible, too, like updating a version number. One major
> difference with respect to CI is that each developer on your team has to
> manually install the hooks themselves and, thus, could choose to not do so. As
> opposed to a CI in a central repository, `git` hooks are therefore not capable
> of enforcing anything but are a pure convenience for the programmer while CI
> could be used to reject pushes or pull requests automatically. Furthermore,
> you are supposed to commit often and, hence, committing should be a fast and
> lightweight action. Therefore, the pre-commit developers explicitly discourage
> running expensive test suites as a pre-commit hook.
> {: .callout}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{: .callout} should sit outside indented block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooops!


> ## Try it yourself
>
Expand Down Expand Up @@ -366,3 +383,4 @@ next to the first commit, a green tick (passed) next to the second, and nothing
[pypi]: https://pypi.org
[starter-workflows]: https://github.com/actions/starter-workflows
[yaml]: https://en.wikipedia.org/wiki/YAML
[pre-commit]: https://pre-commit.com
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alphabetise links

18 changes: 18 additions & 0 deletions _episodes/07-coverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ the consistency checks in the `__init__()` method of `Cell`, and methods such as
methods) to have at least one test, so this test suite would benefit from being
expanded.

> ## How much coverage do I need?
>
> It's worth pointing out again that 100% coverage is not essential for a good
> test suite. If the coverage is below 100%, then that indicates that it's worth
> understanding where the uncovered lines of code are, and whether it is worth
Expand All @@ -116,6 +118,22 @@ expanded.
> between projects.
{: .callout}

> ## Configuring `coverage`
>
> `coverage` and `pytest-cov` are configurable via a toml file called
> `.coveragerc` by default. Various details about behaviour and output can be
> adjusted there. Most notably, explicit exceptions can be defined that exclude
> certain files, blocks or lines from the coverage report.
>
> This is useful in various situation; you can, e.g., exclude the test files
> from the coverage report to reduce noise or change the commandline output.
>
> Another opinionated idea is to indeed aim for 100% code coverage but
> explicitly exclude what you consider unimportant in your testing. While
> opponents say that is just cheating, you had to make a concious decision to
> exclude a piece of code and explicitly documented it in a file (with a comment
> explaining the decision in the best case).
{: .callout}

## Coverage and continuous integration

Expand Down
Loading