Description
Motivation
Our testing strategy is quite exhaustive, but it is large and takes time to evaluate and run (especially from scratch).
Fortunately, there is a lot of redundancy, and we surely can lighten it meaningfully.
General idea
- Establish, design and implement a criteria for marking a test as platform-agnostic. Such tests would not need to run on all four platforms.
- Refactor our test set to maximize the number of such tests:
- Remove the platform-specificy from most tests,
- Create extra (but way fewer) tests to account for what has been removed.
Background
Which tests are targetted by this RFC
Let us decompose or our huge test set
-
$T_\text{custom}$ ("top-level" tests): Everything that is not automatically generated fromtest-sources/
. -
$T_\text{config}$ ("config" tests): Everything undertest-sources/
that is automatically processed as a configuration test (i.e., we runnvim
with this config)
Let's leave
Let's instead focus on
Platform-agnostic vs. Platform-specific tests
A nixvim configuration is said to be platform agnostic (PA) if it has no platform-specific behavior. I.e.:
Definition: A configuration where extraPackages
, extraPlugins
, extraPythonPackages
... are empty.
(Note: this definition is probably incomplete)
Conversely, tests that are not PA, will be refered to as platform specific (PS).
Of course such tests still rely on neovim, which is platform-specific, but we can make the assumption that neovim is working on all platforms (it is extensively tested elsewhere).
Also, nothing prevents the tests from having "platform-specific" lua code in the extraConfigLua
option, which is also technically PS.
I will assume that this probably does not exist or at least has a very low risk.
These purely platform-agnostic tests can be limited to running exclusively on our primary platform (most likely x86_64-linux
).
Factorization of the test set
Currently, most of our tests are technically PS.
However, the main idea of this proposal relies on two properties:
- Each test
$t \in T_c$ can be split in two sub-tests:$t_{PS}$ (a platform specific component) and$t_{PA}$ (a platform agnostic component). - We can easily build few macro tests that factorize all
$t_{PS}$ components of the tests in$T_c$ .
I will illustrate this with the example of plugin tests.
By default plugin tests are PS.
However, a plugin test is testing 3 things:
- The plugin can be installed in the wrapper on all platforms.
-> PS (a package/plugin can be broken on a specific platform) - This configuration can be evaluated (i.e., options exist and have legit, correctly typed values)
-> PA except for evaluating the required packages/plugins that is PS (see 1.) - The lua configuration is valid. Neovim starts without error with this config.
-> Although theoretically PS, it can be assumed to be PA
Having a single allPluginPackages
test that installs all plugin packages that we support (similarly to our already existing modules-dependencies-all
test) could single-handedly account for testing all plugins for property (1).
Assuming that such a test exist, all plugin tests can then be assumed to be PA.
Hence, they can all be marked as such and run on a single platform.
Implementation
- Introduce a
tests.platformSpecific
(boolean) flag to each test config that encodes the PS/PA property.
Maybe, this should be opt-in (true
by default) to prevent a - Add an
allPluginPackages
test.
We could collect alloptions.plugins.*.package.default
items and add them to theextraPlugins
of this test.
This is the general idea. In practice, a more cautious search might be necessary to effectivaly collect all plugin packages across the Nixvim [sub-]modules. - Automatically mark the following tests as PA:
- Tests that already are PA (no
extraPackages
& co) - Tests which PS effects are already tested elsewhere
- Tests that already are PA (no
Conclusion
This proposal could help drastically reduce the CI weight on nix-community
's infrastructure.
Most importantly, the darwin tests are the most problematic ones.
They take far longer to run compared to the linux ones and the Mac mini that runs our darwin CI often gets overwhelmed.
In terms of drawbacks and limitations, I can think of two:
- The PS and PA quialification for the different tests comprise some assumptions.
While my intuition is that opering the aforementioned factorization would not effectively weaken our current test coverage, it is important to properly think of eventual flaws caused by these assumptions. - Implementing this logic is not trivial and will inevitably complexify the test creation code.