The tests herein are meant to uphold the security, quality, and performance contracts of Firecracker.
The testing system is built around pytest.
Our tools/devtool
script is a convenience wrapper which automatically
downloads necessary test artifacts from S3, before invoking pytest inside a
docker container. For detailed help on usage, see tools/devtool help
.
To run all available tests that would also run as part of our PR CI (e.g.
excluding tests marked with pytest.mark.nonci
):
tools/devtool -y test
To run only tests from specific directories and/or files:
tools/devtool -y test -- integration_tests/performance/test_boottime.py
To run a single specific test from a file:
tools/devtool -y test -- integration_tests/performance/test_boottime.py::test_boottime
Note that all paths should be specified relative to the tests
directory, not
the repository root.
Alternatively, pytest provides the option to run all tests where the test name
contains some substring via the -k
option:
tools/devtool -y test -- -k 1024 integration_tests/performance/test_boottime.py::test_boottime
This is particularly useful for specifying parameters of test functions. For example, the above command will run all boottime tests with a microVM size of 1024MB.
If you are not interested in the capabilities of devtool
, use pytest directly,
either from inside the container:
tools/devtool -y shell -p
pytest [<pytest argument>...]
or natively on your dev box:
python3 -m pytest [<pytest argument>...]
Output, including testrun results, goes to stdout
. Errors go to stderr
. By
default, stdout and stderr are captured while tests are running and are printed
in the final failure report only if they fail. To print them while running
regardless of success or failure, pass the -s
flag, e.g.
tools/devtool -y test -- -s
.
- A bare-metal
Linux
host withuname -r
>= 5.10 and KVM enabled (/dev/kvm
device node exists) - Docker
awscli
version 2
The pytest
-powered integration tests rely on Firecracker's HTTP API for
configuring and communicating with the VMM. Alongside these, the vmm
crate
also includes several native-Rust integration tests, which
exercise its programmatic API without the HTTP integration. Cargo
automatically picks up these tests when cargo test
is issued. They also count
towards code coverage.
To run only the Rust integration tests:
cargo test --test integration_tests --all
Unlike unit tests, Rust integration tests are each run in a separate process.
cargo
also packages them in a new crate. This has several known side effects:
-
Only the
pub
functions can be called. This is fine, as it allows the VMM to be consumed as a programmatic user would. If any function is necessary but notpub
, please consider carefully whether it conceptually needs to be in the public interface before making it so. -
The correct functioning scenario of the
vmm
implies that itexit
s with code0
. This is necessary for proper resource cleanup. However,cargo
doesn't expect the test process to initiate its own demise, therefore it will not be able to properly collect test output.Example:
cargo test --test integration_tests running 3 tests test test_setup_serial_device ... ok
To learn more about Rust integration test, see the Rust book.
A/B-Testing is a testing strategy where some test function is executed twice in different environments (the A and B environments), and the overall test result depends on a comparison of these outputs of the test function in these two environments. The advantage of A/B-testing is that it does not require the specification of a ground truth to compare against. It is instead dynamically generated by running the test function in environment A. Firecracker's A/B-testing generally compares Firecracker binaries compiled from two separate commits (e.g. an A binary which is compiled from the HEAD of the main branch, and a B binary which is compiled from the HEAD of a pull request opened against main).
We use this testing approach if a test's ground truth...
- ...can change due to influence external to the code base (e.g. a security test that fails if a CVE is published for one of our dependencies), or
- ...is too complex/changes too often to reasonably be contained in the code base (e.g. extensive performance benchmark results).
For examples of how to utilize A/B-testing inside an integration test, have a
look at our A/B-Testing module or our
cargo audit
test.
If such an A/B-Test is executed outside of the context of a PR (meaning there is
no canonical choice of A and B to be made), it will simply try to assert the
state of the environment in which it was executed (e.g. the cargo audit
test
above when run on a PR will fail iff a newly added dependency has a known open
RustSec advisory. If run outside a PR, it will fail if any existing dependency
has an open RustSec advisory).
Firecracker has a special framework for orchestrating long-running A/B-tests
which run outside the pre-PR CI. Instead, these tests are scheduled to run
post-merge. Specific tests, such as our
snapshot restore latency tests
contain no assertions themselves, but rather they emit data series using the
aws_embedded_metrics
library. When executed by the
tools/ab_test.py
orchestration script, these data
series are collected. The orchestration script executes each test twice with
different Firecracker binaries, and then matches up corresponding data series
from the A and B run. For each data series, it performs a non-parametric
test. For each data series where the difference between the A and B run is
considered statically significant, it will print out the associated metric.
Please see tools/ab_test.py --help
for information on how to configure what
the script considers significant.
Writing your own A/B-Test is easy: Simply write a test that outputs a data series and has no functional assertions. Then, when this test is run under the A/B-Test orchestrator, all data series emitted will be picked up automatically for statistical analysis.
To add a new A/B-Test to our post-PR test suite, add the corresponding test
function to .buildkite/pipeline_perf.py
. To
manually run an A/B-Test, use
tools/devtool -y test --ab [optional arguments to ab_test.py] run <dir A> <dir B> --test <test specification>
Here, dir A and dir B are directories containing firecracker and jailer
binaries whose performance characteristics you wish to compare. You can use
./tools/devtool build --rev <revision> --release
to compile binaries from an
arbitrary git object (commit SHAs, branches, tags etc.). This will create
sub-directories in build
containing the binaries. For example, to compare
boottime of microVMs between Firecracker binaries compiled from the main
branch and the HEAD
of your current branch, run
tools/devtool -y build --rev main --release
tools/devtool -y build --rev HEAD --release
tools/devtool -y test --ab -- run build/main build/HEAD --test integration_tests/performance/test_boottime.py::test_boottime
First, A/B-Compatible tests need to emit more than one data point for each metric for which they wish to support A/B-testing. This is because non-parametric tests operate on data series instead of individual data points.
When emitting metrics with aws_embedded_metrics
, each metric (data series) is
associated with a set of dimensions. The tools/ab_test.py
script uses these
dimension to match up data series between two test runs. It only matches up two
data series with the same name if their dimensions match.
Special care needs to be taken when pytest expands the argument passed to
tools/ab_test.py
's --test
option into multiple individual test cases. If two
test cases use the same dimensions for different data series, the script will
fail and print out the names of the violating data series. For this reason,
A/B-Compatible tests should include a performance_test
key in their
dimension set whose value is set to the name of the test.
In addition to the above, care should be taken that the dimensions of the data
series emitted by some test case are unique to that test case. For example, if
we have a boottime test parameterized by number of vcpus, but the emitted
boottime data series' dimension set is just
{"performance_test": "test_boottime"}
, then tools/ab_test.py
will not be
able to tell apart data series belonging to different microVM sizes, and instead
combine them (which is probably not desired). For this reason A/B-Compatible
tests should always include all pytest parameters in their dimension set.
Lastly, performance A/B-Testing through tools/ab_test.py
can only detect
performance differences that are present in the Firecracker binary. The
tools/ab_test.py
script only checks out the revisions it is passed to execute
cargo build
to generate a Firecracker binary. It does not run integration
tests in the context of the checked out revision. In particular, both the A
and the B run will be triggered from within the same docker container, and
using the same revision of the integration test code. This means it is not
possible to use orchestrated A/B-Testing to assess the impact of, say, changing
only python code (such as enabling logging). Only Rust code can be A/B-Tested.
The exception to this are toolchain differences. If both specified revisions
have rust-toolchain.toml
files, then tools/ab_test.py
will compile using the
toolchain specified by the revision, instead of the toolchain installed in the
docker container from which the script is executed.
We run automated A/B-Tests on every pull request after merge, if the pull
request touches any rust code. The pipeline is generated by the
pipeline_perf.py
script. To manually
schedule an A/B-Test in buildkite, the REVISION_A
and REVISION_B
environment
variables need to be set in the "Environment Variables" field under "Options" in
buildkite's "New Build" modal.
While our automated A/B-Testing suite only supports A/B-Tests across commit ranges, you can also use the scripts to manually run A/B-comparisons for arbitrary environment (such as comparison how the same Firecracker binary behaves on different hosts).
For this, run the desired tests in your environments using devtool
as you
would for a non-A/B test. The only difference to a normal test run is you should
set two environment variables: AWS_EMF_ENVIRONMENT=local
and
AWS_EMF_NAMESPACE=local
:
AWS_EMF_ENVIRONMENT=local AWS_EMF_NAMESPACE=local tools/devtool -y test -- integration_tests/performance/test_boottime.py::test_boottime
This instructs aws_embedded_metrics
to dump all data series that our A/B-Test
orchestration would analyze to stdout
, and pytest will capture this output
into a file stored at ./test_results/test-report.json
.
The tools/ab_test.py
script can consume these test reports, so next collect
your two test report files to your local machine and run
tools/ab_test.py analyze <first test-report.json> <second test-report.json>
This will then print the same analysis described in the previous sections.
If during tools/ab_test.py analyze
you get an error like
$ tools/ab_test.py analyze <first test-report.json> <second test-report.json>
Traceback (most recent call last):
File "/firecracker/tools/ab_test.py", line 412, in <module>
data_a = load_data_series(args.report_a)
File "/firecracker/tools/ab_test.py", line 122, in load_data_series
for line in test["teardown"]["stdout"].splitlines():
KeyError: 'stdout'
double check that the AWS_EMF_ENVIRONMENT
and AWS_EMF_NAMESPACE
environment
variables are set to local
. Particularly, when collecting data from buildkite
pipelines generated from .buildkite/pipeline_perf.py
, ensure you pass
--step-param env/AWS_EMF_NAMESPACE=local --step-param env/AWS_EMF_SERVICE_NAME=local
!
Tests can be added in any (existing or new) sub-directory of tests/
, in files
named test_*.py
.
By default, pytest
makes all fixtures in conftest.py
available to all test functions. You can also create conftest.py
in
sub-directories containing tests, or define fixtures directly in test files. See
the pytest
documentation for
details.
Most integration tests use fixtures that abstract away the creation and teardown of Firecracker processes. The following fixtures spawn Firecracker processes that are pre-initialized with specific guest kernels and rootfs:
uvm_plain_any
is parametrized by the guest kernels supported by Firecracker and a read-only Ubuntu 22.04 squashfs as rootfs,uvm_plain
yields a Firecracker process pre-initialized with a 5.10 kernel and the same Ubuntu 22.04 squashfs.
Generally, tests should use the former if you are testing some interaction between the guest and Firecracker, while the latter should be used if Firecracker functionality unrelated to the guest is being tested.
Firecracker uses two special pytest markers to determine which tests are run in which context:
- Tests marked as
nonci
are not run in the PR CI pipelines. Instead, they run in separate pipelines according to various cron schedules. - Tests marked as
no_block_pr
are run in the "optional" PR CI pipeline. This pipeline is not required to pass for merging a PR.
All tests without markers are run for every pull request, and are required to pass for the PR to be merged.
Add a new function annotated with #[test]
in
integration_tests.rs
.
There are helper methods for writing to and reading from a guest filesystem. For example, to overwrite the guest init process and later extract a log:
def test_with_any_microvm_and_my_init(test_microvm_any):
# [...]
test_microvm_any.slot.fsfiles['mounted_root_fs'].copy_to(my_init, 'sbin/')
# [...]
test_microvm_any.slot.fsfiles['mounted_root_fs'].copy_from('logs/', 'log')
copy_to()
source paths are relative to the host root and destination paths are
relative to the mounted_root_fs
root. Vice versa for copy_from()
.
Copying files to/from a guest file system while the guest is running results in undefined behavior.
Running on an EC2 .metal
instance with an Amazon Linux 2
AMI:
# Get firecracker
yum install -y git
git clone https://github.com/firecracker-microvm/firecracker.git
# Run all tests
cd firecracker
tools/devtool test
In our CI, integration tests are run on EC2 .metal
instances. We list the
instance types and host operating systems we test in
our README
. Multiple test runs can share a
.metal
instance, meaning it is possible to observe noisy neighbor effects when
running the integration test suite (and particularly, tests should not assume
the ability to configure host-global resources). The exception to this are
integration tests found in
integration_tests/performance
. These tests
are always executed single-tenant, and additionally tweak various host-level
setting to achieve consistent performance. Please see the test
section of
tools/devtool help
for more information.
- Testrun: A sandboxed run of all (or a selection of) integration tests.
- Test Session: A
pytest
testing session. One per testrun. A Testrun will start a Test Session once the sandbox is created. - Test: A function named
test_
from this tree, that ensures a feature, functional parameter, or quality metric of Firecracker. Should assert or raise an exception if it fails. - Fixture: A function that returns an object that makes it very easy to add
Tests: E.g., a spawned Firecracker microvm. Fixtures are functions marked
with
@pytest.fixture
from a files named eitherconftest.py
, or from files where tests are found. Seepytest
documentation on fixtures. - Test Case: An element from the cartesian product of a Test and all possible states of its parameters (including its fixtures).
Q1:
I have a shell script that runs my tests and I don't want to rewrite
it.
A1:
Insofar as it makes sense, you should write it as a python test
function. However, you can always call the script from a shim python test
function. You can also add it as a microvm image resource in the s3 bucket (and
it will be made available under microvm.slot.path
) or copy it over to a guest
filesystem as part of your test.
Q2:
I want to add more tests that I don't want to commit to the Firecracker
repository.
A2:
Before a testrun or test session, just add your test
directory under tests/
. pytest
will discover all tests in this tree.
Q3:
I want to have my own test fixtures, and not commit them in the
repo.
A3:
Add a conftest.py
file in your test directory, and place your
fixtures there. pytest
will bring them into scope for all your tests.
Q4:
I want to use more/other microvm test images, but I don't want to add
them to the common s3 bucket.
A4:
Add your custom images to the build/img
subdirectory in the Firecracker source tree. This directory is bind-mounted in
the container and used as a local image cache.
Q5:
How can I get live logger output from the tests?
A5:
Accessing
pytest.ini will allow you to modify logger settings.
Q6:
Is there a way to speed up integration tests execution time?
A6:
You
can narrow down the test selection as described in the Running section. For
example:
- Pass the
-k substring
option to pytest to only run a subset of tests by specifying a part of their name. - Only run the tests contained in a file or directory.
- Easily run tests manually on a development/test machine, and in a continuous integration environments.
- Each test should be independent, and self-contained. Tests will time out, expect a clean environment, and leave a clean environment behind.
- Always run with the latest dependencies and resources.
Pytest was chosen because:
- Python makes it easy to work in the clouds.
- Python has built-in sandbox (virtual environment) support.
pytest
has great test discovery and allows for simple, function-like tests.pytest
has powerful test fixture support.
Note: The below TODOs are also mentioned in their respective code files.
- Use the Firecracker Open API spec to populate Microvm API resource URLs.
- Event-based monitoring of microvm socket file creation to avoid while spins.
- Self-tests (e.g., Tests that test the testing system).
- Looking into
pytest-ordering
to ensure test order. - Create an integrated, layered
say
system across the test runner and pytest (probably based on an environment variable). - Per test function dependency installation would make tests easier to write.
- Type hinting is used sparsely across tests/* python module. The code would be more easily understood with consistent type hints everywhere.
Contributing to this testing system requires a dive deep on pytest
.
When troubleshooting tests, it is important to only narrow down the ones that
are of interest. One can use the --last-failed
parameter to only run the tests
that failed from the previous run. Useful when several tests fail after making
large changes.
To avoid having to enter/exit Docker every test run, you can run the tests directly within a Docker session:
tools/devtool -y shell --privileged
tools/test.sh integration_tests/functional/test_api.py
Just append --pdb
, and when a test fails it will drop you in pdb, where you
can examine local variables and the stack, and can use the normal Python REPL.
tools/devtool -y test -- -k 1024 integration_tests/performance/test_boottime.py::test_boottime --pdb
tools/devtool -y shell --privileged
export PYTEST_ADDOPTS=--pdbcls=IPython.terminal.debugger:TerminalPdb
tools/test.sh -k 1024 integration_tests/performance/test_boottime.py::test_boottime
There is a helper command in devtool that does just that, and is easier to type:
tools/devtool -y test_debug -k 1024 integration_tests/performance/test_boottime.py::test_boottime
There is a helper to enable the console, but it has to be run before spawning the Firecracker process:
uvm.help.enable_console()
uvm.spawn()
uvm.basic_config()
uvm.start()
...
Once that is done, if you get dropped into pdb, you can do this to open a tmux
tab connected to the console (via screen
).
uvm.help.tmux_console()
Just run the test in a loop, and make it drop you into pdb when it fails.
while true; do
tools/devtool -y test -- integration_tests/functional/test_balloon.py::test_deflate_on_oom -k False --pdb
done
We can run the tests in parallel via pytest-xdist
. Not all tests can run in
parallel (the ones in build
and performance
are not supposed to run in
parallel).
By default, the tests run sequentially. One can use the -n
to control the
parallelism. Just -n
will run as many workers as CPUs, which may be too many.
As a rough heuristic, use half the available CPUs. I use -n4 for my 8 CPU
(HT-enabled) laptop. In metals 8 is a good number; more than that just gives
diminishing returns.
tools/devtool -y test -- integration_tests/functional -n$(expr $(nproc) / 2) --dist worksteal
First, make the test fail and drop you into PDB. For example:
tools/devtool -y test_debug integration_tests/functional/test_api.py::test_api_happy_start --pdb
Then,
ipdb> test_microvm.help.gdbserver()
You get some instructions on how to run GDB to attach to gdbserver.
The integration tests usually compile Firecracker as part of the test initialization. But there's an option in case we want to run the tests against a different version of Firecracker, for example a previous release:
./tools/devtool test -- --binary-dir ../v1.8.0
The directory specified with --binary-dir
should contain at least two
binaries: firecracker
and jailer
.
Tested in Ubuntu 22.04 and AL2023. AL2 does not work due to an old Python (3.8).
# replace with yum in Fedora/AmazonLinux
sudo apt install python3-pip
sudo pip3 install pytest ipython requests psutil tenacity filelock "urllib3<2.0" requests_unixsocket aws_embedded_metrics pytest-json-report pytest-timeout
cd tests
sudo env /usr/local/bin/pytest integration_tests/functional/test_api.py
⚠️ Notice this runs the tests as root!
tools/devtool -y sandbox
That should drop you in an IPython REPL, where you can interact with a microvm:
uvm.help.print_log()
uvm.get_all_metrics()
uvm.ssh.run("ls")
snap = uvm.snapshot_full()
uvm.help.tmux_ssh()
It supports a number of options, you can check with devtool sandbox -- --help
.
Running without Docker
source /etc/os-release
case $ID-$VERSION_ID in
amzn-2)
sudo yum remove -y python3
sudo amazon-linux-extras install -y python3.8
sudo ln -sv /usr/bin/python3.8 /usr/bin/python3
sudo ln -sv /usr/bin/pip3.8 /usr/bin/pip3
esac
sudo pip3 install pytest ipython requests psutil tenacity filelock "urllib3<2.0" requests_unixsocket
sudo env PYTHONPATH=tests HOME=$HOME ~/.local/bin/ipython3 -i tools/sandbox.py -- --binary-dir ../repro/v1.4.1
Warning
Notice this runs as root!