Skip to content

Conversation

@AlexJones0
Copy link
Contributor

@AlexJones0 AlexJones0 commented Feb 9, 2026

This PR introduces a large number of tests specifically targeting the scheduler. The primary goals are to:

  1. Provide a better understanding of the existing features of the scheduler.
  2. Show a clearer picture of what is and isn't working within the scheduler, so we know what may need to be fixed.
  3. Enable refactoring and/or rewriting of the scheduler with reduced chances of introducing unintentional changes or regressions in functionality.

To those ends, this PR:

  1. Creates mocks of the scheduled jobs / Launcher class to allow us to test the behaviour of the scheduler, by querying the mocked data.
  2. Introduces additional test dev dependencies and project/CI config to allow better testing of the scheduler.
  3. Adds a wide variety of tests covering: smoke tests, job launching / polling, job weighting / priority, parallelism, dependency resolution, error cases, edge cases, etc. as well as tests for the scheduler's signal handlers.

See the commit messages and comments for more information. The intention is to address issues in follow up PRs (either with bug fixes or complete refactors/rewrites), rather than overloading this PR with even more content.

@AlexJones0 AlexJones0 force-pushed the scheduler_tests branch 10 times, most recently from 830307b to 44f90fc Compare February 10, 2026 13:17
Copy link
Collaborator

@machshev machshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @AlexJones0, really looking forward to seeing this getting in.

Just a few aesthetics related suggestions to make the tests a bit more readable if possible? Also there seems to be a bit of duplication in the test factory helpers.

It looks like your still making changes to the PR, so I'll leave it there for now.

Comment on lines +26 to +40
FAIL_DEP_ON_MULTIPLE_TARGETS = """
DVSim cannot handle dependency fan-in (i.e. depending on jobs) across multiple targets.

Specifically, when all successors of the first target are initially enqueued, they are
removed from the `scheduled` queues. If any item in another target then also depends
on those items (i.e. across *another* target), then the completion of these items will
in turn attempt to enqueue their own successors, which cannot be found as they are no
longer present in the `scheduled` queues.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a fairly big oversight! Which might explain why build failures propagate so quickly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note specifically that "target" here refers to a phase/layer e.g. "build", "run" or "code_cov", not a job. So for example a "code_cov" dependency can depend on many different jobs that are all in the second "run" phase, but if it depends on jobs in both the "build" and "run" phase (or just "build" actually, see the non-consecutive target error), it will never be scheduled and we get caught in an infinite loop.

The current DVSim basically assumes the following:

  • JobSpecs are provided such that their targets are given in execution order. So you cannot have a "run" job given before a "build" job if the run job depends on the build job.
  • It appears to assume that targets are sequential/consecutive and monotonically increasing. You cannot have bi-directional dependencies between targets, nor dependencies that skip targets. You also cannot have a job that depends on jobs from different targets, or that is depended on by jobs of different targets.
  • It assumes that all jobs are reachable through some dependency chain from one of the jobs in the starting phase. It does not handle "orphaned" jobs without parents well.

In my mind the job specs should basically form a DAG where edges are dependencies, and DVsim's scheduler essentially produces and runs some topological sort. The "targets" should just be a mapping for status tracking and there probably doesn't need to be any restrictions among the targets themselves.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes scheduling the jobs by stages always seemed to be an arbitrary restriction to me. At any point in time there is a set of jobs that don't have dependencies and they can be scheduled.

Not sure if there is any reason why sim jobs can't start until all the build jobs are done?
In fact it may be beneficial not to in the future. For example if we can split the DAG into independent sub-DAGs (for example at IP block level), then it could open up more efficient deployment to remote runners. Making sure the build artefacts are co-located with the simulation jobs... rather than having to transfer the files between machines.

Right now launchers submit individual jobs, and the overhead of transferring the job can exceed any time benefit of parallelising the jobs over multiple machines. If we could submit DAG subsets then that could open up some scalability options.

@AlexJones0 AlexJones0 force-pushed the scheduler_tests branch 4 times, most recently from 5290d66 to 19f3677 Compare February 10, 2026 15:24
@AlexJones0 AlexJones0 requested a review from machshev February 10, 2026 15:28
Most of the time we want to use the scheduler in non-interactive mode.
To reduce the verbosity of code using the scheduler, make the kwarg
default to false.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Add mock implementations of the DVSim launcher that can be used to test
the scheduler implementation. This provides functionality for mocking
each job (by name, despite the scheduler only being provided the class),
with the ability to change the reported status, vary the reported status
over a number of polls, emulate launcher errors and launcher busy
errors, and emulate a job that takes some time to be killed. We can also
track the number of `launch()`, `poll()` and `kill()` calls on a per-job
basis.

The mock launcher itself maintains a central context which can be used
to control the job configuration, but also track the maximum number of
concurrent jobs, as well as the order in which jobs started running and
were completed. These features are useful for writing a variety of unit
tests for the scheduler via the public APIs that remain opaque to the
scheduler operation itself.

Also define some fixtures that will be commonly used across the
different scheduler tests.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
For changes to the scheduler, which is a core part of DVSim, we write
tests beforehand to be sure that we are not breaking core functionality.
Some of these test cases find issues where DVSim cannot handle its given
inputs, but we do not want to make changes to fix DVSim at this stage.

In such cases, where tests may get caught in infinite loops, it makes
sense to have the ability to specify a test timeout. This can already be
done as a pytest option, but we want to enable a default timeout so that
anyone running the test suite without knowledge of these issues does not
run into issues and can still see the "expected fail" (xfail) result,
rather than the test being skipped.

The `pytest-timeout` dependency lets us mark individual tests with
timeouts values so that we can do this.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Add 3 initial tests for the scheduler which act as basic smoke tests to
ensure that the scheduler at least appears to work on a basic level.

The tests that the scheduler can handle being given no jobs, 1 job, and
5 jobs, where jobs are just some basic mock jobs.

To help define these tests (and the creation of future tests), a variety
of factories and helper functions / utilities are introduced for common
test patterns, including the ability to define a single job spec or
multiple job specifications that vary in some pre-determined way, and to
create paths for tests where the scheduler logic makes use of file I/O
operations and therefore expects output paths to exist.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Add the `--strict` flag to the Python tests in CI.

This flag makes Pytest run with strict operation, which does a few
useful things for us. Importantly:
* For tests that are expected to fail (due to known failures being
  addressed in the future), it will error if the test unexpectedly
  passes and is not correspondingly marked strict=False. This lets us
  still support flaky tests but ensures that test xfail markers are kept
  up-to-date with the code itself.
* For any markers which pytest doesn't recognize / hasn't been informed
  about, it will explicitly error.
* For any unknown command-line options, pytest will error.

This lets us catch typos / stale configuration, and ensure that the code
remains more in sync with the statuses of the tests.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit adds another ~7 unique scheduler tests for functionality
relating to testing the parallelisation of the scheduler / launchers
(parallel dispatch, and that max parallelism is respected), the polling
behaviour of the launcher (the scheduler will poll for a job until done,
and not beyond that), and the error handling of the scheduler (for
launcher errors and duplicate jobs, and checking that errors propagate
to killed jobs up the dependency tree).

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Adds around ~14 unique unit tests for the scheduler, specifically
checking behaviours relating to the structure of the input job
specification, including the dependencies between jobs and the different
targets/phases/labels of the jobs. As part of this, the
`needs_all_dependencies_passing` parameter on job specs is also tested,
in addition to edge cases such as cyclic dependencies and failure
propagation.

Since a variety of these cases are currently failing in DVSim, these
tests are marked as being "expected to fail" i.e. xfail, with
appropriate reasoning provided for each expected failure.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
This commit adds 3 more unique tests related to the scheduler
prioritisation of jobs/targets with different weights, including the
edge case where jobs sum to zero weight.

As mentioned in the TODO comment, this aspect is not covered in as much
detail, as the limitations of DVSim's current operation (the structure
of the job specification DAG lists that it will actually accept) make it
very difficult to reason about and construct tests about this behaviour.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
These packages are added as optional `test` dependencies to aid in the
development of tests. `pytest-repeat` is a plugin for pytest to make it
easily to repeat tests a number of times. This can be useful for
checking for test flakiness and non-idempotency. `pytest-xdist` is a
plugin for pytest to allow you to run tests in parallel, distributing
tests across multiple CPUs to speed up execution.

The combination of these two plugins lets us run many iterations of
tests in parallel to quickly catch potential issues with flaky behaviour
in tests. Consider running e.g.
  pytest -n auto --count=10 --timeout=5

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Add tests for the Scheduler's signal handler used to allow you to quit
DVSim while the scheduler is running. We test a few different things
using different parameters:
* Test that sending either SIGTERM/SIGINT causes DVSim to gracefully
  exit, killing ongoing (or to-be-dispatched) processes.
* Test that sending SIGINT twice will call the original installed SIGINT
  handler, i.e. it will cause DVSim to instantly die instead.
* Test the above cases with both short and long polls, where long polls
  mean that the scheduler will enter a sleep/wait for ~100 hours between
  polls. As long as this does not timeout, this tests that the scheduler
  is correctly woken from its sleep/wait by the signal and does not have
  to wait for a `poll_freq` duration to handle the signal.

Note the current marked expected failure - from local testing during
development, the use of a `threading.Event` (and perhaps logging) in the
signal handler is not async-signal-safe and therefore, especially when
configured will a poll frequency of 0 (i.e. poll as fast as possible),
we sometimes see tests enter deadlocks and thus fail a small percentage
of the time.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
The signal handler tests do not count towards coverage because they are
tested in a separate process (to avoid signals being intercepted by
`pytest` instead). We can still capture this coverage however by
configuring pytest-coverage to know that we may be executing tests with
multiple processes by using `multiprocessing`.

Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants