Skip to content

Conversation

@CarlosNihelton
Copy link
Collaborator

@CarlosNihelton CarlosNihelton commented Sep 4, 2025

Looks like we can go quite far with GitHub hosted runners these days (since they made large runners generally available).

I devised an integration test workflow for wsl-setup that:

  • Downloads and caches the daily-live/current image from cdimages.ubuntu.com
  • In parallel, it builds the wsl-setup debian package from git
  • When above are done:
    • sets up a WSL instance,
    • installs the deb built by CI,
    • clean up cloud-init state,
    • writes some cloud-init user data,
    • does some actions to prevent the test from being stuck in case cloud-init fails,
    • runs the initial setup via WSL itself (with just wsl -d DISTRO) and
    • runs some assertions based on the results of having wsl-setup installed.

To support development time, part of the assertions are placed in a separate shell script that we can run inside a VM, LXD container or WSL instance at development time (though some can only pass on WSL), assertions which are really coupled to the contents of the package, thus we should pay attention to them when changing the package. Unfortunately I'm not bold enough to put those assertions in an override_build step of debian/rules because I that could result in unnecessary breakages.

My image caching system saves more bandwidth than time. I was pleased to see that CI was taking less than 30s to download a WSL image (building the deb takes much longer, there's probably some caching opportunity in the callee side?). The cache key is the image checksum read from the SHA256SUMS file hosted aside from the image. I had to learn a few tricks to make the cache working cross OSes.

The WSL instance is created from the cached image with wsl --install --from-file ... --no-launch ... so we can further customise it with other wsl -d DISTRO -- <COMMANDS> without those counting as if the OOBE script was run, that is, WSL keeps track of that instance as still non-initialized.

This whole scheme allows us to:

  • claim that the package builds both source and binary outputs
  • claim that the installed package behaves as expected on a target
  • smoke test the latest images, preventing late breakages if breaking changes appear in them without us noticing.

UDENG-7866

@CarlosNihelton CarlosNihelton force-pushed the integration-test branch 21 times, most recently from cf9305d to f8e2cae Compare September 5, 2025 16:01
@CarlosNihelton CarlosNihelton changed the title Adds an integration test workflow feat: Adds an integration test workflow Sep 5, 2025
@CarlosNihelton CarlosNihelton force-pushed the integration-test branch 5 times, most recently from f78c4cc to 03ea3ed Compare September 5, 2025 19:43
@CarlosNihelton CarlosNihelton changed the base branch from main to fix-questing-chrony September 5, 2025 19:52
@CarlosNihelton CarlosNihelton marked this pull request as ready for review September 5, 2025 20:04
Copy link
Collaborator

@hk21702 hk21702 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nitpicks and minor suggestions. Otherwise seems good.

Entirely optional, but it may also be nice to set the default shell per job where possible, it makes expansion of jobs easier since you don't need to think about adding the shell: line.

https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/set-default-values-for-jobs

Allow image caching to succeed despite the test results
Thus we can fail on tests due broken code while a good image
previously downloaded is retained for faster iterations.
Thus we can run it inside a WSL instance at devel time, not only in CI.

Plus a small fix for questing and later: chrony instead of systemd-timesyncd

To make the test case more generic I test first if the unit is enabled
and then if active.

Since the test runs against a fresh installed instance, if the unit is
not active is because of the image state, not users disabling or masking
units etc.
@CarlosNihelton CarlosNihelton changed the base branch from fix-questing-chrony to main September 8, 2025 19:01
- She-bang
- exit code monotonic increase
- disregards chrony.service for now.
Sprinkling whitespaces after long steps or jobs.
@CarlosNihelton
Copy link
Collaborator Author

I rebased over the current tip of the main branch and moved this PR target back to main so we don't couple on the fate of chrony on WSL. New commits start at 1b136f8.

Copy link
Collaborator

@hk21702 hk21702 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, just one nitpick that probably doesn't matter since the step isn't needed for ephemeral runners anyway.

exec: |
source test/integration-test.sh

- name: "Clean up" # Probably not necessary since we're leveraging ephemeral GH's runners.
Copy link
Collaborator

@hk21702 hk21702 Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If actually intended for cleanup, the cleanup step should have if: always() so that it runs even if previous steps fail. But like the comment mentions, the step isn't necessary anyway since we're using ephemeral runners. It is probably fine as is if intended as an independent test step to make sure that we can unregister.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you despite the ephemeral nature of those runners. ;)

@CarlosNihelton CarlosNihelton merged commit 12ab98c into main Sep 9, 2025
3 checks passed
@CarlosNihelton CarlosNihelton deleted the integration-test branch September 9, 2025 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants