Improve twister performance when parallel execution is available #52701

yperess · 2022-12-01T03:48:57Z

Is your enhancement proposal related to a problem? Please describe.
In our test writing we have an issue where creating a new variant of a test (a new binary) has a lot of boilerplate overhead + build times but the cost of piling yet another test is also getting too much. The test binaries end up executing hundreds of tests and taking a long time.

Describe the solution you'd like
I'd like twister to be able to take the final built .elf file and shard it. Effectively, taking the elf file and modifying the ztest suite and test iterable sections, running the different shards in parallel, then combining the results. When running twister, the following steps should take place:

Twister should build the binary as usual
Twister should identify if parallelism is possible
Twister should parse the .elf file and generate test metadata by iterating over the suite and test sections
Twister should assign a weight to each suite based on the number of tests it has, then make some attempt at balancing the loads (I don't think we need to worry about splitting up suites yet).
Twister should copy the .elf file and mutate the start/end pointers as well as shifting some suite data such that the binary is unaware of the other suites.
Twister should run the various .elf files (locally, on QEMUs, or on hardware) in parallel then join the results

Describe alternatives you've considered
I've considered having an easier way of specifying a similar binary in the testcase.yaml file, but the only way I seem to be able to do that is by introducing a Kconfig to select which test suites to include into the binary, this ends up being a little confusing and forces the test writers to have to manage the test binaries by hand.

yperess · 2022-12-01T03:49:06Z

@tristan-google

gmarull · 2022-12-01T14:47:51Z

@PerMac isn't this something twister V2 already supports using https://pypi.org/project/pytest-parallel/ ?

tristan-google · 2022-12-01T18:54:40Z

I think Yuval's idea here doesn't concern the handlers for actually running stuff in parallel but rather the ability to take a giant testcase binary and sharding it post-build but pre-test runtime. So the executable would be duplicated N times and each copy would be modified to only run roughly 1/N of the tests, where N can be derived from the number of cores, number of matching HW boards plugged in, etc. At that point those N copies could be run in parallel through whatever handler/mechanism is in place just as if they were independent testcases.

PerMac · 2022-12-02T09:04:53Z

I am not sure if I get the full idea. Would your idea require:

being able to call any single ztest test case from an application loaded on a board, instead of running full test suite from start to the bottom?

PerMac · 2022-12-02T10:14:41Z

Another question: would there be a place in ztest framework to allow communication and calling single tests? E.g something like

dut.flash('zephyr.elf'')
dut.write('ztest call testcase_A')
output = dut.read()
assert output == "testcase_A PASS"

Where dut.write, dut.read would be handled in twister for serial communication with dut, and the flashed ztest application would handle what and how to call, when a command on the serial input is given?

yperess · 2022-12-06T13:09:51Z

@PerMac not quite, Twister doesn't need to communicate with the dut. Prior to flashing, twister will identify if parallelism is possible, that means:

If the application is on native_posix or unit_testing boards
If the application is going to run on a QEMU
If multiple of the same DUT are connected that can be flashed

If any of the above is a yes, then we can parallelize the execution of the binary. Lets assume the binary has 30 test suites and each suite has 10 tests for a total of 300 tests (I believe we're just over that for our largest integration test binary). Twister would build the test binary as usual, then decide "how parallel things can be". For example, in the case of a native_posix test we can parallelize by the number of cores (lets assume a high number running on a CI server). So Twister:

Copies the .elf file 30 times (since min(30 suites, 96 cores)).
For each .elf copy it mutates zTest's _ztest_suite_node_list_start and _ztest_suite_node_list_end so that the binary is completely unaware of the other tests.
It would then run each of the 30 binaries in parallel giving the results for the 1 suite
It would then combine the results.

Some consideration would need to be taken for QEMUs and DUTs as the cost of flashing becomes greater (it might not make sense to split a binary with only 2 suites on a DUT test) but I believe we can tweak these heuristics as we get closer to feature complete.

NOTE

We have some very large integration tests and currently developers have to choose between the convenience of adding their test to the same binary (which bloats it even more) or go through the boilerplate of creating another binary for their test (with no real configuration changes). This leads to a very large discrepancy in run times where locally I'm seeing some tests run in milliseconds while our largest 2 tests run in 65 seconds. The issue is even worse in our CI where disk I/O is slower and the large handler.log writing is the bottleneck pushing those larger binaries closer to 120 seconds.

zephyrbot · 2024-02-12T18:57:33Z

Hi @tristan-google,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. Please confirm the issue is correctly assigned and re-assign it otherwise.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@yperess you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!

yperess added Enhancement Changes/Updates/Additions to existing features area: Twister Twister area: Continuous Integration labels Dec 1, 2022

yperess assigned tristan-google Dec 1, 2022

hakehuang mentioned this issue Jan 5, 2023

define a recommended process in zephyr for a CI framework integration #53535

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve twister performance when parallel execution is available #52701

Improve twister performance when parallel execution is available #52701

yperess commented Dec 1, 2022

yperess commented Dec 1, 2022

gmarull commented Dec 1, 2022

tristan-google commented Dec 1, 2022 •

edited

Loading

PerMac commented Dec 2, 2022

PerMac commented Dec 2, 2022

yperess commented Dec 6, 2022

zephyrbot commented Feb 12, 2024

Improve twister performance when parallel execution is available #52701

Improve twister performance when parallel execution is available #52701

Comments

yperess commented Dec 1, 2022

yperess commented Dec 1, 2022

gmarull commented Dec 1, 2022

tristan-google commented Dec 1, 2022 • edited Loading

PerMac commented Dec 2, 2022

PerMac commented Dec 2, 2022

yperess commented Dec 6, 2022

zephyrbot commented Feb 12, 2024

tristan-google commented Dec 1, 2022 •

edited

Loading