Skip to content

Conversation

@cpuguy83
Copy link
Collaborator

Before this we are returning errors to buildkit when there are test errors.
Or for that matter, if there is an error during evaluation of the stuff running up to the tests.
This causes issues with the debug workflow ("docker buildx dap") since there is no LLB state to attach to, so it can't set breakpoints properly.

With this change even when we get a solve error when evaluating state, we return the bad state so that the debugger can attach to it. When a test fails, we add a new LLB exec op which prints the joined test errors and exits non-zero so that the build will fail.

There's also some minor changes to make it simpler to discard any changes to the filesystem from running tests, or in the case of non-container outputs return the original state, while keeping the dependency chain in tact.

This is a simpler alternative to #898.
#898 is hitting some race condition in buildkit (or at least that's what it looks like).
Additionally this can actually be backported to older release branches as the changes are minor.

@cpuguy83 cpuguy83 force-pushed the be_lazyish branch 9 times, most recently from 5cfdf4e to 6f64bda Compare December 23, 2025 20:23
@cpuguy83 cpuguy83 marked this pull request as ready for review December 23, 2025 21:07
Copilot AI review requested due to automatic review settings December 23, 2025 21:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the test execution flow to ensure that an LLB state is always returned, even when tests fail. This enables proper debugging support with docker buildx dap by ensuring breakpoints can attach to the state. Instead of returning errors directly to BuildKit when tests fail, the PR adds a new LLB exec operation that prints test errors to stderr and exits non-zero, maintaining the build failure behavior while preserving the state chain.

Key changes:

  • Modified RunTests to return llb.StateOption instead of (Reference, error), allowing tests to be composed into the state chain
  • Introduced withTestError to create an error-reporting state that outputs test failures before exiting
  • Added TestErrorCmd as a new frontend command to handle test error output

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test/linux_target_test.go Updates test to consume logs via SolveStatus callback and verify test errors appear in output rather than as direct function errors
targets/linux/rpm/distro/pkg.go Changes RunTests signature from returning (Reference, error) to StateOption, composing test dependencies and execution into the state chain
targets/linux/deb/distro/pkg.go Matches RPM implementation with same RunTests signature change to StateOption pattern
targets/linux/distro_handler.go Updates handlers to use new StateOption-based test execution and adds getRef helper to simplify state marshalling and solving
frontend/test_runner.go Core refactoring: RunTests now returns StateOption, adds withTestError for error reporting, TestErrorCmd for outputting errors, and evalState helper
cmd/frontend/main.go Wires up TestErrorCmd in the frontend command router
.github/workflows/ci.yml Adds Docker system prune before tests to clean up resources

// RunTests runs the package tests
// The returned reference is the solved container state
func (cfg *Config) RunTests(ctx context.Context, client gwclient.Client, spec *dalec.Spec, sOpt dalec.SourceOpts, ctr llb.State, targetKey string, opts ...llb.ConstraintsOpt) (gwclient.Reference, error) {
def, err := ctr.Marshal(ctx, opts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice seeing that duplicated bits refactored!

assert.ErrorContains(t, err, v)
dir := t.TempDir()
f, err := os.OpenFile(filepath.Join(dir, "out.txt"), os.O_CREATE|os.O_RDWR, 0o600)
assert.NilError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's the good place for discussing this, but generally speaking, assert does not interrupt the execution of the function which often leads to panics and false-negatives (e.g. when f is nil), so maybe it would be better to use require instead, which gives clearer failure handling?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert here is not testify, its gotest.tools/v3/asssert.
assert.NilError does halt the test

Copy link
Contributor

@invidian invidian Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's confusing, TIL. Maybe we could remove testify assert completely to avoid this confusion. I'll have a look into that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I opened an issue yesterday to track this, except I suggested switching to testify since that is a more well known library (of course either way seems fine to me).

testEnv.RunTest(ctx, t, func(ctx context.Context, client gwclient.Client) {
sr := newSolveRequest(withSpec(ctx, t, spec), withBuildTarget(target))
_, err := client.Solve(ctx, sr)
assert.Assert(t, err != nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.NilError?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is asserting that we have an error.


v := (&dalec.CheckOutputError{Path: "/non-existing-file", Kind: dalec.CheckFileNotExistsKind, Expected: "exists=true", Actual: "exists=false"}).Error()
assert.ErrorContains(t, err, v)
doTest := func(target string) func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe?

Suggested change
doTest := func(target string) func(t *testing.T) {
testsForTarget := func(target string) func(t *testing.T) {

// RunTests runs the tests defined in the spec against the given the input state.
// The result of this is either the provided `finalState` or a state that errors when executed with the errors produced by the tests.
// The input state should be a runnable container with all dependencies already installed.
func RunTests(ctx context.Context, client gwclient.Client, spec *dalec.Spec, finalState llb.State, target string, platform *ocispecs.Platform) llb.StateOption {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If implementing llb.StateOption, do we actually need to take finalState llb.State as argument? Wouldn't in llb.State always be the final state?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalState here is because the input state is not always going to be the final state, e.g. for packaging targets such as jammy/deb and sysext targets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still don't understand. Could you elaborate to help me graps how this works? So the input state could be worker with test dependencies, while final state would be where deb/sysext are? Do I get this right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input state must always be a state that is ready to have test assertions executed.
deb/rpm/sysext are not that but we still want to run tests when building those targets (actually even the container state may have extra stuff due to test dependencies being installed).

In those cases we make the input state the container (ie package installed) + test dependencies.
The final state, which the test runner outputs, needs to be what we plan to output OR we need to handle that at the call site, which could also be fair (maybe a separate state option called Reset or something).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I agree the code itself is complex.
I factored out a bunch of this in #898 but ran into some issues that need to be dug into further and may require fixes in buildkit (which would also delay actually being able to refactor since the test runner wouldn't be usable on older buildkits).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I factored out a few things to split this loop up in to smaller chunks, hope this helps (all in the 2nd commit).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. It makes sense now :)

case testrunner.CheckFilesCmdName:
args := flag.Args()[2:]
testrunner.CheckFilesCmd(args)
case frontend.TestErrorCmdName:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a separate frontend command to handle printing errors? Couldn't this just run after we are finished running tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't just printing the errors, its exiting non-zero.
The reason we have this is so we can return an LLB state to buildkit that will error out and fail the build rather than returning an error to buildkit which messes up the dap debugger.

I actually have an alternative to this where each test assertion is actually run as an exec (state.Run(...)) but that seems to be triggering a bug in buildkit.
So I've settled for this case where we keep running the tests like we used to which requires un-lazying the build but still give buildkit a valid LLB state that produces an error.

Before this we are returning errors to buildkit when there are test
errors.
Or for that matter, if there is an error during evaluation of the stuff
running up to the tests.
This causes issues with the debug workflow ("docker buildx dap") since
there is no LLB state to attach to, so it can't set breakpoints
properly.

With this change even when we get a solve error when evaluating state,
we return the bad state so that the debugger can attach to it.
When a test fails, we add a new LLB exec op which prints the joined
test errors and exits non-zero so that the build will fail.

There's also some minor changes to make it simpler to discard any
changes to the filesystem from running tests, or in the case of
non-container outputs return the original state, while keeping the
dependency chain in tact.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83 cpuguy83 added this to the v0.20 milestone Jan 5, 2026
})
}

err = group.Wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the task for this PR, but I feel code in RunTests would benefit from the extensive test suite for all different behaviours, since there is really a lot of code and as a newcomer it's hard to figure out what it all does.

Or maybe alternatively some refactoring to better organize levels of abstraction in this function could help make it easier to understand and drive down the complexity would be sufficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a large amount of coverage for the test runner within the integration test suite as it is.
Its possible an audit may come up with some case that's not tested but we do cover both positive and negative case in there to validate the test runner is correct and we also rely on the test runner for some of the rest of the test suite.

This is still not where I'd want it to be, but splits things out so
logic is not sitting in a massive loop with a bunch of other unrelated
things.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83 cpuguy83 enabled auto-merge (rebase) January 7, 2026 15:32
@cpuguy83 cpuguy83 merged commit de9db1a into project-dalec:main Jan 7, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants