Add many scheduler unit & integration tests #93

AlexJones0 · 2026-02-09T21:06:09Z

This PR introduces a large number of tests specifically targeting the scheduler. The primary goals are to:

Provide a better understanding of the existing features of the scheduler.
Show a clearer picture of what is and isn't working within the scheduler, so we know what may need to be fixed.
Enable refactoring and/or rewriting of the scheduler with reduced chances of introducing unintentional changes or regressions in functionality.

To those ends, this PR:

Creates mocks of the scheduled jobs / Launcher class to allow us to test the behaviour of the scheduler, by querying the mocked data.
Introduces additional test dev dependencies and project/CI config to allow better testing of the scheduler.
Adds a wide variety of tests covering: smoke tests, job launching / polling, job weighting / priority, parallelism, dependency resolution, error cases, edge cases, etc. as well as tests for the scheduler's signal handlers.

See the commit messages and comments for more information. The intention is to address issues in follow up PRs (either with bug fixes or complete refactors/rewrites), rather than overloading this PR with even more content.

machshev

Great work @AlexJones0, really looking forward to seeing this getting in.

Just a few aesthetics related suggestions to make the tests a bit more readable if possible? Also there seems to be a bit of duplication in the test factory helpers.

It looks like your still making changes to the PR, so I'll leave it there for now.

tests/test_scheduler.py

machshev · 2026-02-10T12:39:24Z

tests/test_scheduler.py

+FAIL_DEP_ON_MULTIPLE_TARGETS = """
+DVSim cannot handle dependency fan-in (i.e. depending on jobs) across multiple targets.
+
+Specifically, when all successors of the first target are initially enqueued, they are
+removed from the `scheduled` queues. If any item in another target then also depends
+on those items (i.e. across *another* target), then the completion of these items will
+in turn attempt to enqueue their own successors, which cannot be found as they are no
+longer present in the `scheduled` queues.
+"""


Wow, that's a fairly big oversight! Which might explain why build failures propagate so quickly?

Note specifically that "target" here refers to a phase/layer e.g. "build", "run" or "code_cov", not a job. So for example a "code_cov" dependency can depend on many different jobs that are all in the second "run" phase, but if it depends on jobs in both the "build" and "run" phase (or just "build" actually, see the non-consecutive target error), it will never be scheduled and we get caught in an infinite loop.

The current DVSim basically assumes the following:

JobSpecs are provided such that their targets are given in execution order. So you cannot have a "run" job given before a "build" job if the run job depends on the build job.

It appears to assume that targets are sequential/consecutive and monotonically increasing. You cannot have bi-directional dependencies between targets, nor dependencies that skip targets. You also cannot have a job that depends on jobs from different targets, or that is depended on by jobs of different targets.

It assumes that all jobs are reachable through some dependency chain from one of the jobs in the starting phase. It does not handle "orphaned" jobs without parents well.

In my mind the job specs should basically form a DAG where edges are dependencies, and DVsim's scheduler essentially produces and runs some topological sort. The "targets" should just be a mapping for status tracking and there probably doesn't need to be any restrictions among the targets themselves.

Yes scheduling the jobs by stages always seemed to be an arbitrary restriction to me. At any point in time there is a set of jobs that don't have dependencies and they can be scheduled.

Not sure if there is any reason why sim jobs can't start until all the build jobs are done?
In fact it may be beneficial not to in the future. For example if we can split the DAG into independent sub-DAGs (for example at IP block level), then it could open up more efficient deployment to remote runners. Making sure the build artefacts are co-located with the simulation jobs... rather than having to transfer the files between machines.

Right now launchers submit individual jobs, and the overhead of transferring the job can exceed any time benefit of parallelising the jobs over multiple machines. If we could submit DAG subsets then that could open up some scalability options.

tests/test_scheduler.py

Most of the time we want to use the scheduler in non-interactive mode. To reduce the verbosity of code using the scheduler, make the kwarg default to false. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Add mock implementations of the DVSim launcher that can be used to test the scheduler implementation. This provides functionality for mocking each job (by name, despite the scheduler only being provided the class), with the ability to change the reported status, vary the reported status over a number of polls, emulate launcher errors and launcher busy errors, and emulate a job that takes some time to be killed. We can also track the number of `launch()`, `poll()` and `kill()` calls on a per-job basis. The mock launcher itself maintains a central context which can be used to control the job configuration, but also track the maximum number of concurrent jobs, as well as the order in which jobs started running and were completed. These features are useful for writing a variety of unit tests for the scheduler via the public APIs that remain opaque to the scheduler operation itself. Also define some fixtures that will be commonly used across the different scheduler tests. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

For changes to the scheduler, which is a core part of DVSim, we write tests beforehand to be sure that we are not breaking core functionality. Some of these test cases find issues where DVSim cannot handle its given inputs, but we do not want to make changes to fix DVSim at this stage. In such cases, where tests may get caught in infinite loops, it makes sense to have the ability to specify a test timeout. This can already be done as a pytest option, but we want to enable a default timeout so that anyone running the test suite without knowledge of these issues does not run into issues and can still see the "expected fail" (xfail) result, rather than the test being skipped. The `pytest-timeout` dependency lets us mark individual tests with timeouts values so that we can do this. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Add 3 initial tests for the scheduler which act as basic smoke tests to ensure that the scheduler at least appears to work on a basic level. The tests that the scheduler can handle being given no jobs, 1 job, and 5 jobs, where jobs are just some basic mock jobs. To help define these tests (and the creation of future tests), a variety of factories and helper functions / utilities are introduced for common test patterns, including the ability to define a single job spec or multiple job specifications that vary in some pre-determined way, and to create paths for tests where the scheduler logic makes use of file I/O operations and therefore expects output paths to exist. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Add the `--strict` flag to the Python tests in CI. This flag makes Pytest run with strict operation, which does a few useful things for us. Importantly: * For tests that are expected to fail (due to known failures being addressed in the future), it will error if the test unexpectedly passes and is not correspondingly marked strict=False. This lets us still support flaky tests but ensures that test xfail markers are kept up-to-date with the code itself. * For any markers which pytest doesn't recognize / hasn't been informed about, it will explicitly error. * For any unknown command-line options, pytest will error. This lets us catch typos / stale configuration, and ensure that the code remains more in sync with the statuses of the tests. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

This commit adds another ~7 unique scheduler tests for functionality relating to testing the parallelisation of the scheduler / launchers (parallel dispatch, and that max parallelism is respected), the polling behaviour of the launcher (the scheduler will poll for a job until done, and not beyond that), and the error handling of the scheduler (for launcher errors and duplicate jobs, and checking that errors propagate to killed jobs up the dependency tree). Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Adds around ~14 unique unit tests for the scheduler, specifically checking behaviours relating to the structure of the input job specification, including the dependencies between jobs and the different targets/phases/labels of the jobs. As part of this, the `needs_all_dependencies_passing` parameter on job specs is also tested, in addition to edge cases such as cyclic dependencies and failure propagation. Since a variety of these cases are currently failing in DVSim, these tests are marked as being "expected to fail" i.e. xfail, with appropriate reasoning provided for each expected failure. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

This commit adds 3 more unique tests related to the scheduler prioritisation of jobs/targets with different weights, including the edge case where jobs sum to zero weight. As mentioned in the TODO comment, this aspect is not covered in as much detail, as the limitations of DVSim's current operation (the structure of the job specification DAG lists that it will actually accept) make it very difficult to reason about and construct tests about this behaviour. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

These packages are added as optional `test` dependencies to aid in the development of tests. `pytest-repeat` is a plugin for pytest to make it easily to repeat tests a number of times. This can be useful for checking for test flakiness and non-idempotency. `pytest-xdist` is a plugin for pytest to allow you to run tests in parallel, distributing tests across multiple CPUs to speed up execution. The combination of these two plugins lets us run many iterations of tests in parallel to quickly catch potential issues with flaky behaviour in tests. Consider running e.g. pytest -n auto --count=10 --timeout=5 Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

Add tests for the Scheduler's signal handler used to allow you to quit DVSim while the scheduler is running. We test a few different things using different parameters: * Test that sending either SIGTERM/SIGINT causes DVSim to gracefully exit, killing ongoing (or to-be-dispatched) processes. * Test that sending SIGINT twice will call the original installed SIGINT handler, i.e. it will cause DVSim to instantly die instead. * Test the above cases with both short and long polls, where long polls mean that the scheduler will enter a sleep/wait for ~100 hours between polls. As long as this does not timeout, this tests that the scheduler is correctly woken from its sleep/wait by the signal and does not have to wait for a `poll_freq` duration to handle the signal. Note the current marked expected failure - from local testing during development, the use of a `threading.Event` (and perhaps logging) in the signal handler is not async-signal-safe and therefore, especially when configured will a poll frequency of 0 (i.e. poll as fast as possible), we sometimes see tests enter deadlocks and thus fail a small percentage of the time. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

The signal handler tests do not count towards coverage because they are tested in a separate process (to avoid signals being intercepted by `pytest` instead). We can still capture this coverage however by configuring pytest-coverage to know that we may be executing tests with multiple processes by using `multiprocessing`. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

AlexJones0 force-pushed the scheduler_tests branch 10 times, most recently from 830307b to 44f90fc Compare February 10, 2026 13:17

machshev reviewed Feb 10, 2026

View reviewed changes

AlexJones0 force-pushed the scheduler_tests branch 4 times, most recently from 5290d66 to 19f3677 Compare February 10, 2026 15:24

AlexJones0 requested a review from machshev February 10, 2026 15:28

AlexJones0 force-pushed the scheduler_tests branch from 19f3677 to 00b4f2d Compare February 10, 2026 15:32

AlexJones0 requested review from hcallahan-lowrisc and rswarbrick February 10, 2026 15:35

AlexJones0 added 3 commits February 10, 2026 15:37

refactor: make scheduler interactivity default to false

533700d

Most of the time we want to use the scheduler in non-interactive mode. To reduce the verbosity of code using the scheduler, make the kwarg default to false. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

AlexJones0 force-pushed the scheduler_tests branch from 00b4f2d to d8181cd Compare February 10, 2026 15:44

AlexJones0 mentioned this pull request Feb 10, 2026

Remove deadlocks by making the scheduler signal handler signal-safe #94

Open

AlexJones0 added 6 commits February 10, 2026 19:44

AlexJones0 added 2 commits February 10, 2026 19:44

AlexJones0 force-pushed the scheduler_tests branch from d8181cd to 6f29b2b Compare February 10, 2026 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add many scheduler unit & integration tests #93

Add many scheduler unit & integration tests #93

Uh oh!

AlexJones0 commented Feb 9, 2026 •

edited

Loading

Uh oh!

machshev left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

machshev Feb 10, 2026

Uh oh!

AlexJones0 Feb 10, 2026

Uh oh!

machshev Feb 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add many scheduler unit & integration tests #93

Are you sure you want to change the base?

Add many scheduler unit & integration tests #93

Uh oh!

Conversation

AlexJones0 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

machshev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

machshev Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

AlexJones0 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

machshev Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexJones0 commented Feb 9, 2026 •

edited

Loading