The illusion of 100% code coverage #222

jamesmbaazam · 2024-02-06T15:00:51Z

jamesmbaazam
Feb 6, 2024
Collaborator

I've been thinking a lot about how we often overly rely on 100% code coverage to measure code quality. It's important to consider code coverage in the context of Goodhart's law, which says, "When a measure becomes a target, it ceases to be a good measure".

This post will discuss how 100% coverage could be an illusion and lead to undetected issues and bugs.

This post will cover categories of expectations to test for, drawing on the infrastructure and workflows for writing unit tests in R. An example of such a category is statistical correctness, which has been discussed in the statistical correctness post. This new article will be a follow-up to existing work in the wild.

I warmly welcome all contributions and co-authorship in the form of ideas, code examples, anecdotes, and experiences.

Related resources

[R packages chapter on testing])(https://r-pkgs.org/testing-basics.html)
Responsible modelling: Unit testing for infectious disease epidemiology

chartgerink · 2024-02-07T08:19:47Z

chartgerink
Feb 7, 2024
Maintainer

Thanks for initiating this issue, @jamesmbaazam 🙏

Tests are fantastic as a means for observing known functionality. These can be expectations around outputs or whether known (previous) issues are resolved (regression tests). Tests help build confidence with known issues in that sense.

Whether test coverage is at 90 or a 100% does not matter all that much to me personally, as this is a heuristic. It says how much code is covered, not the quality with which the code is covered. I can for example test a simple function for one of many scenarios, including it in the code coverage, but forget to test other more critical scenarios. The coverage is only as good as the tests themselves.

Even with 100% coverage, the tests do not cover 100% of the scenarios that we may want to test. This adds the problem of observability - are the tests observing the behaviors we want to be observing?

As mentioned before, tests are great for known issues - they do not work for unknown issues. With unknown issues, there is also the split between known unknowns and unknown unkowns. The risk of known unknowns can be assessed, and may be okay to not observe. The risk for any software lies in the unknown unknowns and identifying what we don't even realize we're not yet testing properly. This requires a critical look on existing tests and finding gaps that need to be filled - even if there is 100% code coverage to begin with.

0 replies

adamkucharski · 2024-02-09T16:38:47Z

adamkucharski
Feb 9, 2024
Maintainer

I like this idea – and a useful launch point for a 'what risks are we trying to mitigate?' discussion.

A few related observations that may (or may not) be useful:

there is some ongoing work on package 'health checks' building on the workshop with pandemic hub last year
we used CODECHECK during COVID, which was a useful first pass focused on reproducibility
in exploring GPT generated code, I've noticed there can often be 'domain errors' (e.g. a wonky SIR model) that are easy to spot (and often diagnose) directly if you have some familiarity with disease dynamics, but harder to test for every way someone could implement it incorrectly.

0 replies

jamesmbaazam · 2024-06-18T09:42:24Z

jamesmbaazam
Jun 18, 2024
Collaborator Author

Thanks for your input @chartgerink and @adamkucharski. I just shared a resource here https://github.com/orgs/epiverse-trace/discussions/282 that covers what I was thinking about (different types of tests to ensure your code coverage has adequate quality). I'm not sure of the value add of this suggested blogpost so I would vote to close it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epiverse-TRACE

The illusion of 100% code coverage #222

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Epiverse-TRACE

The illusion of 100% code coverage #222

jamesmbaazam Feb 6, 2024 Collaborator

Related resources

Replies: 3 comments

chartgerink Feb 7, 2024 Maintainer

adamkucharski Feb 9, 2024 Maintainer

jamesmbaazam Jun 18, 2024 Collaborator Author

jamesmbaazam
Feb 6, 2024
Collaborator

chartgerink
Feb 7, 2024
Maintainer

adamkucharski
Feb 9, 2024
Maintainer

jamesmbaazam
Jun 18, 2024
Collaborator Author