runc: add multi-container support to 'runc delete' and update tests #4940

yzewei · 2025-10-16T03:24:21Z

Background

Currently, the runc delete command only supports deleting a single container. In high-frequency container creation and destruction environments, manually deleting stopped containers one by one is inefficient and may cause resource contention (e.g., cgroup locks or DBus delays).

The community has discussed the need for batch deletion of stopped containers in runc discussion #4935.

Changes in this PR

Modified delete.go to support deleting multiple containers at once.
Updated utils_linux.go to handle batch deletion safely.
Updated tests/integration/delete.bats to include tests for batch deletion.
Added delete multi test case to ensure the feature works as expected.

OCI Runtime Specification Alignment

Crucially, this change aligns with the updated OCI Runtime Specification. I have updated the delete operation in the specification to officially support multiple container IDs, clarifying that each container is deleted independently, and errors are reported per container without affecting others.

Specification Update: opencontainers/runtime-spec#1299

Comprehensive Batch Test Coverage

The integration test suite (tests/integration/delete.bats) has been significantly expanded to ensure the batch delete feature is robust across all supported container states and Cgroup environments:

Container States: Added tests for batch deletion of containers that are stopped, running, and paused (the latter two using --force).
Complex Cleanup Scenarios: Included tests for batch deletion of containers in the Host PID Namespace where the init process is already gone, ensuring all remaining processes are correctly killed and reaped.
Cgroup Recursion: Added tests for batch deletion and recursive cleanup of containers that have created their own sub-Cgroups in both Cgroup V1 and Cgroup V2 environments.
- Note on Stability: This required stabilizing the V2 cleanup tests by ensuring Cgroup paths are validated accurately on the host and that non-forced delete tests correctly kill running containers beforehand.

Notes

The new feature only affects containers in the "stopped" state for non-forced deletion.
Existing single-container delete behavior is unchanged.
This PR references community discussion for context: #4935

yzewei · 2025-10-16T03:24:32Z

@kolyshkin @cyphar @AkihiroSuda

lifubang · 2025-10-16T03:45:17Z

@yzewei Perhaps we should first update the description of the Delete operation in the runtime-spec: https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#delete

yzewei · 2025-10-16T04:01:07Z

@yzewei Perhaps we should first update the description of the Delete operation in the runtime-spec: https://github.com/opencontainers/runtime-spec/blob/main/runtime.md#delete

Thanks! I’ve updated the runtime-spec delete operation to support multiple container IDs.
Each container is deleted independently, and errors are reported per container without affecting others.
1299

yzewei · 2025-10-16T07:26:15Z

Test Coverage Update

I have supplemented this PR with comprehensive multi-container (batch) integration tests for runc delete, ensuring the reliability and stability of the batch deletion feature across various scenarios.

Key areas of enhanced testing include:

Batch deletion of containers in running, paused, and stopped states.
Batch handling of complex scenarios involving Host PIDNS where the container's init process has already exited.
Batch recursive cleanup for containers with subordinate Cgroups (in both Cgroup V1 and V2 environments).

Two issues regarding test stability were resolved during this process:

Fixed an inaccuracy in the Cgroup V2 test where the host validation of the sub-Cgroup path was incorrect.
Corrected the logic for non-forced runc delete, which now executes runc kill on running containers before attempting deletion (as required by runc behavior).

cyphar · 2025-10-16T07:30:43Z

@lifubang I don't think we should do that -- the runtime-spec stuff is describing the more general operations, and supporting multiple arguments doesn't really belong there IMHO.

AkihiroSuda · 2025-10-16T07:35:21Z

In high-frequency container creation and destruction environments, manually deleting stopped containers one by one is inefficient and may cause resource contention (e.g., cgroup locks or DBus delays).

Do you have a benchmark result?

Signed-off-by: yzewei <yangzewei@loongson.cn>

cyphar

Out of interest, was this code LLM-generated?

cyphar · 2025-10-16T07:34:19Z

delete.go

+					path := filepath.Join(context.GlobalString("root"), id)
+					if e := os.RemoveAll(path); e != nil {
+						fmt.Fprintf(os.Stderr, "remove %s: %v\n", path, e)
+						errs = append(errs, e)


This removes the "--force ignores errors" case.

Yes, I used AI to help me get this done. I still made sure to review and test everything myself.

cyphar · 2025-10-16T07:35:14Z

delete.go

 				}
-				if force {
-					return nil
+				errs = append(errs, err)


cyphar · 2025-10-16T07:35:57Z

delete.go

-					fmt.Fprintf(os.Stderr, "remove %s: %v\n", path, e)
+		var errs []error
+
+		for _, id := range context.Args() {


I think the body of this loop should be a separate function and just have this loop do the error collection logic -- it's very easy to miss a continue and accidentally return an error.

cyphar · 2025-10-16T07:36:18Z

delete.go

-		// When --force is given, we kill all container processes and
-		// then destroy the container. This is done even for a stopped
-		// container, because (in case it does not have its own PID
-		// namespace) there may be some leftover processes in the
-		// container's cgroup.


This comment has been removed, please add it back.

cyphar · 2025-10-16T07:38:36Z

delete.go

+			switch s {
+			case libcontainer.Stopped:
+				if err := container.Destroy(); err != nil {
+					errs = append(errs, err)
+				}
+			case libcontainer.Created:
+				if err := killContainer(container); err != nil {
+					errs = append(errs, err)
+				}
+			default:
+				errs = append(errs, fmt.Errorf("cannot delete container %s that is not stopped: %s", id, s))
 			}


I think something more like

Suggested change

switch s {

case libcontainer.Stopped:

if err := container.Destroy(); err != nil {

errs = append(errs, err)

}

case libcontainer.Created:

if err := killContainer(container); err != nil {

errs = append(errs, err)

}

default:

errs = append(errs, fmt.Errorf("cannot delete container %s that is not stopped: %s", id, s))

}

switch s {

case libcontainer.Stopped:

err = container.Destroy()

case libcontainer.Created:

err = killContainer(container)

default:

err = fmt.Errorf("cannot delete container %s that is not stopped: %s", id, s)

}

if err != nil {

errs = append(errs, err)

}

would be nicer.

cyphar · 2025-10-16T07:39:10Z

delete.go

-				// if there was an aborted start or something of the sort then the container's directory could exist but
-				// libcontainer does not see it because the state.json file inside that directory was never created.


This comment has been changed, please just keep the old one if there isn't a good reason to change it.

cyphar · 2025-10-16T07:42:21Z

@AkihiroSuda I also find that claim implausible without seeing benchmarks -- the only thing this PR is really saving is the overhead of fork+exec. This whole PR feels like it was generated by an LLM.

yzewei · 2025-10-16T07:46:04Z

Hi @cyphar,

Thanks for the feedback. I might not be fully ready to address this yet. Please allow me some time to review the AI-generated code before I follow up.

yzewei force-pushed the multi-delete branch from 8a4e7fe to 4caa961 Compare October 16, 2025 07:23

runc: add multi-container support to 'runc delete' and update tests

981d5bb

Signed-off-by: yzewei <yangzewei@loongson.cn>

yzewei force-pushed the multi-delete branch from 4caa961 to 981d5bb Compare October 16, 2025 07:37

cyphar requested changes Oct 16, 2025

View reviewed changes

yzewei closed this Oct 16, 2025

cyphar mentioned this pull request Nov 9, 2025

[rfc] LLM policy? #4990

Open

github-actions bot mentioned this pull request Nov 30, 2025

[2025-11-10] Marble Fountain jiacai2050/mofish#1251

Open

		// if there was an aborted start or something of the sort then the container's directory could exist but
		// libcontainer does not see it because the state.json file inside that directory was never created.

runc: add multi-container support to 'runc delete' and update tests #4940

runc: add multi-container support to 'runc delete' and update tests #4940

Uh oh!

Conversation

yzewei commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes in this PR

OCI Runtime Specification Alignment

Comprehensive Batch Test Coverage

Notes

Uh oh!

yzewei commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lifubang commented Oct 16, 2025

Uh oh!

yzewei commented Oct 16, 2025

Uh oh!

yzewei commented Oct 16, 2025

Test Coverage Update

Uh oh!

cyphar commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AkihiroSuda commented Oct 16, 2025

Uh oh!

cyphar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cyphar Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

yzewei Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyphar Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyphar Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyphar Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyphar Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyphar Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyphar commented Oct 16, 2025

Uh oh!

yzewei commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yzewei commented Oct 16, 2025 •

edited

Loading

yzewei commented Oct 16, 2025 •

edited

Loading

cyphar commented Oct 16, 2025 •

edited

Loading

cyphar left a comment •

edited

Loading