Skip to content

Improved debugging support #1472

Open
Open

Description

addresses #1053
addresses #1470

An issue with the current build environment is that we often assume everyone can write a perfect Dockerfile from scratch without any mistakes. In real-world there is a lot of trial and error for writing a complex Dockerfile. Users get errors, need to understand what is causing them, and react accordingly.

In the legacy builder, one of the methods for dealing with this situation was to use --rm=false or look up the image ID of the last image layer from the build output and run docker run session with it to understand what was wrong. Buildkit does not create intermediate images nor make the containers it runs visible in docker run (both for very good reasons). Therefore this is even more complicated now and usually requires the user to set --target to do a partial build and the debug the output of it.

To improve this, we shouldn't try to bring back --rm=false that makes all the builds significantly slower and makes it impossible to manage storage for build cache. Instead, we could provide a better solution for this with a new --debugger flag.

Using --debugger on a build, should that build error, will take the user into a debugger shell similar to interactive docker run experience. There the user can see the error and use control commands to debug the actual cause.

If the error happened on a RUN command (execop in LLB), the user can use shell to rerun the command and keep tweaking it. This will happen in an identical environment to the one where execop runs, for example, this means access to secrets, ssh, cache mounts etc. They can also inspect the environment variables and files in the system that might be causing the issue. Using control commands, a user can switch between the broken state that was left behind by the failed command and the initial base state for that command. So in the case where they would try many possible fixes but end up in a bad state, they can just restore back to the initial state and start again.

If the error happened on a copy (or other file operation like rm), they can run ls and similar tools to find out why the file path is not correct and not working.

For implementation, this depends on #749 for support to run processes on build mounts directly without going through the solver. We would first start by modifying the Executor and ExecOp to instead of releasing the mounts after error, return them together with the error. I believe typed errors #1454 support can be reused for this. They should be returned up to the client Solve method, who can then decide to call llb.Exec with these mounts. If mounts are left unhandled, they are released with the gateway api release.

Once the debugging has completed, and the user has made changes to the source files, it is easy to trigger a restart of the build with exactly the same settings. This is also useful if you think you might be hitting a temporary error. If the retry didn't fix it, user is brought back to the debugger.

It might make sense to introduce a concept of "debugger image" that is used as a basis of the debugging environment. This would allow avoiding hardcoded logic in an opinionated area.

Later this could be extended with the step-based debugger, and source mapping support could be used to make source code changes directly in the editor or tracking dependencies in the build graph.

@hinshun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions