tests: always make sure we return an LLB state #909

cpuguy83 · 2025-12-22T23:05:38Z

Before this we are returning errors to buildkit when there are test errors.
Or for that matter, if there is an error during evaluation of the stuff running up to the tests.
This causes issues with the debug workflow ("docker buildx dap") since there is no LLB state to attach to, so it can't set breakpoints properly.

With this change even when we get a solve error when evaluating state, we return the bad state so that the debugger can attach to it. When a test fails, we add a new LLB exec op which prints the joined test errors and exits non-zero so that the build will fail.

There's also some minor changes to make it simpler to discard any changes to the filesystem from running tests, or in the case of non-container outputs return the original state, while keeping the dependency chain in tact.

This is a simpler alternative to #898.
#898 is hitting some race condition in buildkit (or at least that's what it looks like).
Additionally this can actually be backported to older release branches as the changes are minor.

Copilot

Pull request overview

This PR refactors the test execution flow to ensure that an LLB state is always returned, even when tests fail. This enables proper debugging support with docker buildx dap by ensuring breakpoints can attach to the state. Instead of returning errors directly to BuildKit when tests fail, the PR adds a new LLB exec operation that prints test errors to stderr and exits non-zero, maintaining the build failure behavior while preserving the state chain.

Key changes:

Modified RunTests to return llb.StateOption instead of (Reference, error), allowing tests to be composed into the state chain
Introduced withTestError to create an error-reporting state that outputs test failures before exiting
Added TestErrorCmd as a new frontend command to handle test error output

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
test/linux_target_test.go	Updates test to consume logs via SolveStatus callback and verify test errors appear in output rather than as direct function errors
targets/linux/rpm/distro/pkg.go	Changes RunTests signature from returning (Reference, error) to StateOption, composing test dependencies and execution into the state chain
targets/linux/deb/distro/pkg.go	Matches RPM implementation with same RunTests signature change to StateOption pattern
targets/linux/distro_handler.go	Updates handlers to use new StateOption-based test execution and adds getRef helper to simplify state marshalling and solving
frontend/test_runner.go	Core refactoring: RunTests now returns StateOption, adds withTestError for error reporting, TestErrorCmd for outputting errors, and evalState helper
cmd/frontend/main.go	Wires up TestErrorCmd in the frontend command router
.github/workflows/ci.yml	Adds Docker system prune before tests to clean up resources

targets/linux/distro_handler.go

test/linux_target_test.go

frontend/test_runner.go

invidian · 2026-01-05T11:40:02Z

targets/linux/rpm/distro/pkg.go

+// RunTests runs the package tests
 // The returned reference is the solved container state
-func (cfg *Config) RunTests(ctx context.Context, client gwclient.Client, spec *dalec.Spec, sOpt dalec.SourceOpts, ctr llb.State, targetKey string, opts ...llb.ConstraintsOpt) (gwclient.Reference, error) {
-	def, err := ctr.Marshal(ctx, opts...)


Nice seeing that duplicated bits refactored!

invidian · 2026-01-05T11:43:55Z

test/linux_target_test.go

-				assert.ErrorContains(t, err, v)
+				dir := t.TempDir()
+				f, err := os.OpenFile(filepath.Join(dir, "out.txt"), os.O_CREATE|os.O_RDWR, 0o600)
+				assert.NilError(t, err)


I'm not sure that's the good place for discussing this, but generally speaking, assert does not interrupt the execution of the function which often leads to panics and false-negatives (e.g. when f is nil), so maybe it would be better to use require instead, which gives clearer failure handling?

assert here is not testify, its gotest.tools/v3/asssert.
assert.NilError does halt the test

Ah, that's confusing, TIL. Maybe we could remove testify assert completely to avoid this confusion. I'll have a look into that.

Indeed, I opened an issue yesterday to track this, except I suggested switching to testify since that is a more well known library (of course either way seems fine to me).

invidian · 2026-01-05T11:44:35Z

test/linux_target_test.go

+				testEnv.RunTest(ctx, t, func(ctx context.Context, client gwclient.Client) {
+					sr := newSolveRequest(withSpec(ctx, t, spec), withBuildTarget(target))
+					_, err := client.Solve(ctx, sr)
+					assert.Assert(t, err != nil)


require.NilError?

This is asserting that we have an error.

invidian · 2026-01-05T11:46:20Z

test/linux_target_test.go


-				v := (&dalec.CheckOutputError{Path: "/non-existing-file", Kind: dalec.CheckFileNotExistsKind, Expected: "exists=true", Actual: "exists=false"}).Error()
-				assert.ErrorContains(t, err, v)
+		doTest := func(target string) func(t *testing.T) {


Maybe?

Suggested change

doTest := func(target string) func(t *testing.T) {

testsForTarget := func(target string) func(t *testing.T) {

invidian · 2026-01-05T11:49:41Z

frontend/test_runner.go

+// RunTests runs the tests defined in the spec against the given the input state.
+// The result of this is either the provided `finalState` or a state that errors when executed with the errors produced by the tests.
+// The input state should be a runnable container with all dependencies already installed.
+func RunTests(ctx context.Context, client gwclient.Client, spec *dalec.Spec, finalState llb.State, target string, platform *ocispecs.Platform) llb.StateOption {


If implementing llb.StateOption, do we actually need to take finalState llb.State as argument? Wouldn't in llb.State always be the final state?

finalState here is because the input state is not always going to be the final state, e.g. for packaging targets such as jammy/deb and sysext targets.

I think I still don't understand. Could you elaborate to help me graps how this works? So the input state could be worker with test dependencies, while final state would be where deb/sysext are? Do I get this right?

The input state must always be a state that is ready to have test assertions executed.
deb/rpm/sysext are not that but we still want to run tests when building those targets (actually even the container state may have extra stuff due to test dependencies being installed).

In those cases we make the input state the container (ie package installed) + test dependencies.
The final state, which the test runner outputs, needs to be what we plan to output OR we need to handle that at the call site, which could also be fair (maybe a separate state option called Reset or something).

That said, I agree the code itself is complex.
I factored out a bunch of this in #898 but ran into some issues that need to be dug into further and may require fixes in buildkit (which would also delay actually being able to refactor since the test runner wouldn't be usable on older buildkits).

I factored out a few things to split this loop up in to smaller chunks, hope this helps (all in the 2nd commit).

Thanks for the explanation. It makes sense now :)

invidian · 2026-01-05T11:56:22Z

cmd/frontend/main.go

 	case testrunner.CheckFilesCmdName:
 		args := flag.Args()[2:]
 		testrunner.CheckFilesCmd(args)
+	case frontend.TestErrorCmdName:


Why do we need a separate frontend command to handle printing errors? Couldn't this just run after we are finished running tests?

This isn't just printing the errors, its exiting non-zero.
The reason we have this is so we can return an LLB state to buildkit that will error out and fail the build rather than returning an error to buildkit which messes up the dap debugger.

I actually have an alternative to this where each test assertion is actually run as an exec (state.Run(...)) but that seems to be triggering a bug in buildkit.
So I've settled for this case where we keep running the tests like we used to which requires un-lazying the build but still give buildkit a valid LLB state that produces an error.

Before this we are returning errors to buildkit when there are test errors. Or for that matter, if there is an error during evaluation of the stuff running up to the tests. This causes issues with the debug workflow ("docker buildx dap") since there is no LLB state to attach to, so it can't set breakpoints properly. With this change even when we get a solve error when evaluating state, we return the bad state so that the debugger can attach to it. When a test fails, we add a new LLB exec op which prints the joined test errors and exits non-zero so that the build will fail. There's also some minor changes to make it simpler to discard any changes to the filesystem from running tests, or in the case of non-container outputs return the original state, while keeping the dependency chain in tact. Signed-off-by: Brian Goff <cpuguy83@gmail.com>

invidian · 2026-01-06T15:12:13Z

frontend/test_runner.go

+			})
+		}
+
+		err = group.Wait()


I don't think it's the task for this PR, but I feel code in RunTests would benefit from the extensive test suite for all different behaviours, since there is really a lot of code and as a newcomer it's hard to figure out what it all does.

Or maybe alternatively some refactoring to better organize levels of abstraction in this function could help make it easier to understand and drive down the complexity would be sufficient.

I think we have a large amount of coverage for the test runner within the integration test suite as it is.
Its possible an audit may come up with some case that's not tested but we do cover both positive and negative case in there to validate the test runner is correct and we also rely on the test runner for some of the rest of the test suite.

This is still not where I'd want it to be, but splits things out so logic is not sitting in a massive loop with a bunch of other unrelated things. Signed-off-by: Brian Goff <cpuguy83@gmail.com>

cpuguy83 force-pushed the be_lazyish branch 9 times, most recently from 5cfdf4e to 6f64bda Compare December 23, 2025 20:23

cpuguy83 marked this pull request as ready for review December 23, 2025 21:07

Copilot AI review requested due to automatic review settings December 23, 2025 21:07

Copilot started reviewing on behalf of cpuguy83 December 23, 2025 21:07 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

cpuguy83 force-pushed the be_lazyish branch from 6f64bda to ab3b95d Compare December 23, 2025 21:17

MorrisLaw reviewed Dec 23, 2025

View reviewed changes

frontend/test_runner.go Outdated Show resolved Hide resolved

frontend/test_runner.go Outdated Show resolved Hide resolved

cpuguy83 force-pushed the be_lazyish branch from ab3b95d to 5a13182 Compare December 23, 2025 21:52

invidian reviewed Jan 5, 2026

View reviewed changes

cpuguy83 force-pushed the be_lazyish branch from 5a13182 to 8b65bcc Compare January 5, 2026 22:17

cpuguy83 added this to the v0.20 milestone Jan 5, 2026

invidian reviewed Jan 6, 2026

View reviewed changes

testrunner: factor out some things to make code clearer

51740e9

This is still not where I'd want it to be, but splits things out so logic is not sitting in a massive loop with a bunch of other unrelated things. Signed-off-by: Brian Goff <cpuguy83@gmail.com>

invidian approved these changes Jan 7, 2026

View reviewed changes

cpuguy83 enabled auto-merge (rebase) January 7, 2026 15:32

DannyBrito approved these changes Jan 7, 2026

View reviewed changes

cpuguy83 merged commit de9db1a into project-dalec:main Jan 7, 2026
25 checks passed

	doTest := func(target string) func(t *testing.T) {
	testsForTarget := func(target string) func(t *testing.T) {

tests: always make sure we return an LLB state #909

tests: always make sure we return an LLB state #909

Conversation

cpuguy83 commented Dec 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

invidian Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

invidian Jan 6, 2026 •

edited

Loading