Add integration-testing blog post #378

joshwlambert · 2025-03-20T14:23:00Z

The post specifies a license if you don't want to use the default CC BY
All authors have an ORCID iD
Relevant keywords / tags has been added. In particular, if you want your post to be shared on R-bloggers, you must tag it with R
Images or other external resources have been committed and pushed
The post uses pure quarto syntax, rather than HTML or R code, unless necessary
R code is included in the post but is not executed.

Right before merging:

The date field has been updated
All reviewers have been acknowledged in a short paragraph
A PR has been opened in the blueprints to link to this post
The post has been re-rendered and content of the _freeze/ folder is up-to-date

This PR adds a blog post on integration testing in Epiverse-TRACE. It provides an overview of why we write these tests, how we write them and an example of what the tests look like.

As we are only starting out writing these types of tests, and in the hope that this will help team members internally, please add any information that you think will help set a good standard for integration testing in Epiverse.

This is just an early draft of the blog post, so please give feedback on how it can be improved. @Bisaloo and @Karim-Mane please feel free to add yourselves as authors if you'd like to contribute as an author rather than a reviewer.

A couple of questions:

@Bisaloo do you think it is worth including the figure on slide 10 of your interoperability strategy in this post? I'm not sure if just stating the 3 principles is clear enough.
@Karim-Mane are you happy with the description of {cleanepi} in the post, and the example integration test code?

/schedule 2025-04-14T08:01

netlify · 2025-03-20T14:23:18Z

✅ Deploy Preview for tourmaline-marshmallow-241b40 ready!

Name	Link
🔨 Latest commit	`d7bdad0`
🔍 Latest deploy log	https://app.netlify.com/sites/tourmaline-marshmallow-241b40/deploys/67e3da67be9a350008a174c7
😎 Deploy Preview	https://deploy-preview-378--tourmaline-marshmallow-241b40.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

joshwlambert · 2025-03-20T14:31:21Z

Converting to draft until getting some early feedback on the post to make clear this is still a work in progress.

Karim-Mane · 2025-03-21T15:03:20Z

posts/integration-testing/index.qmd

+In comparison to commonly used _unit testing_, which looks to isolate and test specific parts of a software package, e.g. a function; _integration testing_ is the testing of several components of software, both within and between packages. Therefore, integration testing can be used to ensure interoperability is maintained while one or multiple components in pipelines are being developed. [Continuous integration](https://en.wikipedia.org/wiki/Continuous_integration) provides a way to run these tests before merging, releasing, or deploying code.
+:::
+
+## How we setup integration testing in Epiverse


Thanks @joshwlambert for this impressive blog. It looks good to me. Below are some of my comments or ideas:

Might be good to define an analysis pipeline you may have in mind, then define its steps (as a directed acyclic graph for example). You will then specify that this blog post only covers how to achieve integration testing between step 1 and N. That way, you could easily expand this topic to the other parts of your pipeline in this document or another one.

I've added a mention to pipelines as directed acyclic graphs in 6066440. Let me know if there is any other information on this point you'd like adding.

Karim-Mane · 2025-03-21T15:07:31Z

posts/integration-testing/index.qmd

+
+## Case study of interoperable functionality using {simulist} and {cleanepi}
+
+The aim of {simulist} is to simulate outbreak data, such as line lists or contact tracing data. By default it generates complete and accurate data, but can also augment this data to emulate empirical data via post-processing functionality. One such post-processing function is `simulist::messy_linelist()`, which introduces a range of irregularities, missingness, and type coercions to simulated line list data. Complementary to this, the {cleanepi} package has a set of cleaning functions that standardised tabular epidemiological data, recording the set of cleaning operations run by compiling a [report and appending it to the cleaned data](https://epiverse-trace.github.io/cleanepi/articles/design_principle.html#output).


It will be good to specify what integration testing approach you are using in this example. My understanding is that you are using a big-bang approach according to this. But will let you define this.

I'm not sure whether the approach shown in the blog post is big-bang integration testing. My understanding is that that is a testing approach for all components in one go. Whereas our testing blocks each test a different aspect of functionality. But to be honest, reading about this on various websites, their examples are so abstract that it's hard to map each integration testing concept/approach to what we are doing.

If you can are sure which integration testing approach we're using and where to add it to the text, please add a suggested change on this PR and I will commit it.

posts/integration-testing/index.qmd

chartgerink

Thanks 🙏 Great idea, and definitely worth a blog. 😊

Is the epiverse package also taking these contributions as tests already? I couldn't find this test there for example, even though the blog mentions it is a good place to put those.

chartgerink · 2025-03-24T15:16:18Z

posts/integration-testing/index.qmd

+author:
+  - name: "Joshua W. Lambert"
+    orcid: "0000-0001-5218-3046"
+date: "2025-03-20"


Pick any monday in the future and add it in the original PR post to ensure it gets merged then.

/schedule 2025-??-??

I've added a scheduled merge for 14th April, but the merge_schedule GitHub actions workflow has failed. Potentially from my PAT not having the required credentials. Can you take a quick look, it might work if you add the schedule keyword?

Bisaloo

This is a great blog post: important topic and well-written. Thanks!

My only potential comment is that the example here (simulist & cleanepi) is not the strongest & clearest example you could use to talk about integration testing.

In this type of tests, I would expect things like "does simulist accept inputs from epiparameter?". This would allow detecting when potential breaking changes break a previously working complex pipeline.

Here, it feels (at least to me) slightly more like you are testing cleanepi functionality rather than ensuring that it can take inputs from simulist.

On the other hand, it's the only example we have at the moment so feel free to decide for yourself if you want to go with this or create another one.

joshwlambert · 2025-03-26T10:25:15Z

Is the epiverse package also taking these contributions as tests already? I couldn't find this test there for example, even though the blog mentions it is a good place to put those.

@chartgerink I hadn't opened a PR on {epiverse} when opening this PR, but it is now open at epiverse-trace/epiverse#7, apologies for the confusion.

joshwlambert · 2025-03-26T10:51:44Z

In this type of tests, I would expect things like "does simulist accept inputs from epiparameter?". This would allow detecting when potential breaking changes break a previously working complex pipeline.

@Bisaloo I can write a couple of integration tests for {epiparameter} & {simulist} and then we can see which is a better example to include in the blog post before publishing.

Here, it feels (at least to me) slightly more like you are testing cleanepi functionality rather than ensuring that it can take inputs from simulist.

I agree, but this might be a result of the close design compatibility between the packages, a sort of the "mark of good design is that you don’t notice it" effect (or at least that's what I'm telling myself 😆)

joshwlambert · 2025-03-26T11:03:22Z

Thanks everyone for your comments, I've moved this PR to ready for review.

Next steps are:

I will draft some integration tests for {epiparameter} and {simulist} to see if they better showcase integration testing for this blog post.
Proof read the post and improve wording
Incorporate any further comments

I've set the merge scheduler to 14th April, so we have a couple of weeks to work through this before merging. It will also give us time to merge epiverse-trace/epiverse#7 which is a blocker for this post. If you could approve this PR prior to 14th April that would be great just to make sure you're happy for it to be published.

joshwlambert · 2025-03-26T14:31:41Z

If we can address epiverse-trace/epiverse#8 it would be good to mention it in this blog post to show how we're maintaining interoperability.

joshwlambert · 2025-03-26T15:02:40Z

Some initial integration tests for {epiparameter} and {simulist} are added on the epiparameter-simulist-tests branch of {epiverse}.

Again these are simple two step integration tests. They are black box tests as all they are currently testing is whether the output of simulist::sim_*() has the correct properties.

Bisaloo · 2025-03-26T15:23:09Z

Some initial integration tests for {epiparameter} and {simulist} are added on the epiparameter-simulist-tests branch of {epiverse}.

Again these are simple two step integration tests. They are black box tests as all they are currently testing is whether the output of simulist::sim_*() has the correct properties.

Seems good to me. I would also add an expect_no_condition() around the simulist calls.

chartgerink · 2025-04-01T07:55:38Z

@joshwlambert - RE the merge schedule - ~~I think the SUDO_GITHUB_TOKEN used in the workflow may have expired~~ there are some issues - I will figure it out and manually merge on the 14th if necessary 👍

~~@Bisaloo - did you add this one? If so, did it expire? My tokens are still valid and named otherwise~~

chartgerink · 2025-04-03T07:43:20Z

⌛ Merge Schedule
Scheduled to be merged on 2025-04-14 08:01:00 (UTC)

add first draft of integration-testing blog post

5629c62

joshwlambert requested review from Bisaloo and chartgerink as code owners March 20, 2025 14:23

linting integration-testing post

f25efad

joshwlambert requested a review from Karim-Mane March 20, 2025 14:26

joshwlambert marked this pull request as draft March 20, 2025 14:30

Karim-Mane reviewed Mar 21, 2025

View reviewed changes

chartgerink approved these changes Mar 24, 2025

View reviewed changes

Bisaloo reviewed Mar 25, 2025

View reviewed changes

joshwlambert added 2 commits March 26, 2025 10:29

add DAG to pipeline intro

6066440

use namespace and comments in code chunk in integration-testing post

d7bdad0

joshwlambert marked this pull request as ready for review March 26, 2025 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration-testing blog post #378

Add integration-testing blog post #378

joshwlambert commented Mar 20, 2025 •

edited by chartgerink

Loading

netlify bot commented Mar 20, 2025 •

edited

Loading

joshwlambert commented Mar 20, 2025

Karim-Mane Mar 21, 2025

joshwlambert Mar 26, 2025

Karim-Mane Mar 21, 2025

joshwlambert Mar 26, 2025

chartgerink left a comment

chartgerink Mar 24, 2025

joshwlambert Mar 26, 2025

Bisaloo left a comment

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

Bisaloo commented Mar 26, 2025

chartgerink commented Apr 1, 2025 •

edited

Loading

chartgerink commented Apr 3, 2025


		## Case study of interoperable functionality using {simulist} and {cleanepi}

		The aim of {simulist} is to simulate outbreak data, such as line lists or contact tracing data. By default it generates complete and accurate data, but can also augment this data to emulate empirical data via post-processing functionality. One such post-processing function is `simulist::messy_linelist()`, which introduces a range of irregularities, missingness, and type coercions to simulated line list data. Complementary to this, the {cleanepi} package has a set of cleaning functions that standardised tabular epidemiological data, recording the set of cleaning operations run by compiling a [report and appending it to the cleaned data](https://epiverse-trace.github.io/cleanepi/articles/design_principle.html#output).

Add integration-testing blog post #378

Are you sure you want to change the base?

Add integration-testing blog post #378

Conversation

joshwlambert commented Mar 20, 2025 • edited by chartgerink Loading

netlify bot commented Mar 20, 2025 • edited Loading

✅ Deploy Preview for tourmaline-marshmallow-241b40 ready!

joshwlambert commented Mar 20, 2025

Karim-Mane Mar 21, 2025

Choose a reason for hiding this comment

joshwlambert Mar 26, 2025

Choose a reason for hiding this comment

Karim-Mane Mar 21, 2025

Choose a reason for hiding this comment

joshwlambert Mar 26, 2025

Choose a reason for hiding this comment

chartgerink left a comment

Choose a reason for hiding this comment

chartgerink Mar 24, 2025

Choose a reason for hiding this comment

joshwlambert Mar 26, 2025

Choose a reason for hiding this comment

Bisaloo left a comment

Choose a reason for hiding this comment

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

joshwlambert commented Mar 26, 2025

Bisaloo commented Mar 26, 2025

chartgerink commented Apr 1, 2025 • edited Loading

chartgerink commented Apr 3, 2025

joshwlambert commented Mar 20, 2025 •

edited by chartgerink

Loading

netlify bot commented Mar 20, 2025 •

edited

Loading

chartgerink commented Apr 1, 2025 •

edited

Loading