Skip to content

Conversation

@YaoZengzeng
Copy link
Member

fixes: #30

Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @YaoZengzeng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the system's capabilities by introducing a dedicated CodeInterpreter custom resource. This resource provides a higher-level abstraction for defining and managing secure, isolated code execution environments, specifically tailored for LLM-generated code. It integrates with the existing agent-sandbox framework to provision and maintain sandboxes, including the ability to pre-warm instances for improved performance. The changes also include the necessary infrastructure for API definition, code generation, and controller logic to make this new resource fully functional.

Highlights

  • New CodeInterpreter CRD: Introduced a new Custom Resource Definition (CRD) called CodeInterpreter under the runtime.agentcube.io/v1alpha1 API group. This CRD allows defining and managing isolated code execution environments.
  • CodeInterpreter Controller: Implemented a new controller, CodeInterpreterReconciler, responsible for reconciling CodeInterpreter resources. This controller creates and manages SandboxTemplate and SandboxWarmPool resources from the agent-sandbox project based on the CodeInterpreter's specification.
  • Warm Pool Management: The CodeInterpreter now supports specifying a warmPoolSize to maintain a pool of pre-warmed sandboxes, which helps reduce startup latency for new code interpretation sessions. The controller actively manages the corresponding SandboxWarmPool.
  • Automated Code Generation: Added a new hack/update-codegen.sh script and updated the Makefile to automate the generation of client-go code, deepcopy methods, and CRD manifests for the new CodeInterpreter API.
  • RBAC and API Server Updates: Updated the API server's main function to register the new runtimev1alpha1 scheme and integrated the CodeInterpreterReconciler. Corresponding RBAC rules and API server permissions were added to support the new resources.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new CodeInterpreter CRD and its associated controller. This is a significant feature addition that allows defining and managing code interpreter runtimes, including warm pools for faster startup. The changes include new API types, a controller, code generation scripts, and Kubernetes manifests.

Overall, the implementation follows standard practices for Kubernetes controllers. The use of controller-runtime and code generation is appropriate. The new CodeInterpreter CRD is well-defined.

I've identified a few areas for improvement:

  • There are some redundant or unnecessary files (crds/_.yaml, crds/codeinterpreter.yaml, k8s/codeinterpreter-rbac.yaml) that should be cleaned up.
  • The CRD definition has a minor issue with Required and default markers that could cause confusion.
  • The controller logic has opportunities for improvement in terms of correctness (status reporting, template comparison) and efficiency (listing resources).

My detailed comments provide specific suggestions for these points. Addressing them will make the implementation more robust and maintainable.

Comment on lines 46 to 47
// +kubebuilder:validation:Required
// +kubebuilder:default="15m"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The SessionTimeout field is marked with +kubebuilder:validation:Required but also has a +kubebuilder:default value. This combination can lead to confusing behavior, as client-side validation (like kubectl apply) may fail if this field is omitted, even though the API server would apply a default. To make this field truly optional with a default value, the Required validation marker should be removed.

Suggested change
// +kubebuilder:validation:Required
// +kubebuilder:default="15m"
// +kubebuilder:default="15m"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Comment on lines 53 to 54
// +kubebuilder:validation:Required
// +kubebuilder:default="8h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to SessionTimeout, the MaxSessionDuration field is marked as Required but also has a default. This can cause validation issues. The +kubebuilder:validation:Required marker should be removed to make it an optional field with a default value.

Suggested change
// +kubebuilder:validation:Required
// +kubebuilder:default="8h"
// +kubebuilder:default="8h"

Comment on lines 342 to 357
func (r *CodeInterpreterReconciler) podTemplateEqual(a, b sandboxv1alpha1.PodTemplate) bool {
// Simple comparison - in production, you might want a more sophisticated comparison
if len(a.Spec.Containers) == 0 || len(b.Spec.Containers) == 0 {
return len(a.Spec.Containers) == len(b.Spec.Containers)
}
if a.Spec.Containers[0].Image != b.Spec.Containers[0].Image {
return false
}
if (a.Spec.RuntimeClassName == nil) != (b.Spec.RuntimeClassName == nil) {
return false
}
if a.Spec.RuntimeClassName != nil && b.Spec.RuntimeClassName != nil {
return *a.Spec.RuntimeClassName == *b.Spec.RuntimeClassName
}
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The podTemplateEqual function performs a very basic comparison and, as the comment notes, is not robust. It only checks the container image and runtime class name. This will fail to detect changes to other important fields like command, args, environment variables, or resource requests/limits. This could lead to the SandboxTemplate not being updated when the CodeInterpreter spec changes. You should use a more comprehensive comparison, such as reflect.DeepEqual, to ensure all changes are detected. You will need to import the reflect package.

func (r *CodeInterpreterReconciler) podTemplateEqual(a, b sandboxv1alpha1.PodTemplate) bool {
	// Use reflect.DeepEqual for a comprehensive comparison.
	return reflect.DeepEqual(a.Spec, b.Spec)
}

crds/_.yaml Outdated
Comment on lines 1 to 13
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.17.2
spec:
group: ""
names:
kind: ""
plural: ""
scope: ""
versions: null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file appears to be an empty, placeholder Custom Resource Definition, likely generated by controller-gen by mistake. It should be removed to avoid clutter and confusion in the project's CRD directory.

Comment on lines 1 to 144
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: codeinterpreters.runtime.agentcube.io
labels:
app: agentcube
spec:
group: runtime.agentcube.io
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
ports:
type: array
items:
type: object
properties:
pathPrefix:
type: string
name:
type: string
port:
type: integer
format: int32
protocol:
type: string
enum:
- HTTP
- HTTPS
default: HTTP
required:
- port
- protocol
template:
type: object
required:
- image
properties:
labels:
type: object
additionalProperties:
type: string
annotations:
type: object
additionalProperties:
type: string
runtimeClassName:
type: string
image:
type: string
environment:
type: array
items:
type: object
properties:
name:
type: string
value:
type: string
valueFrom:
type: object
required:
- name
command:
type: array
items:
type: string
args:
type: array
items:
type: string
resources:
type: object
properties:
limits:
type: object
additionalProperties:
type: string
requests:
type: object
additionalProperties:
type: string
sessionTimeout:
type: string
default: "15m"
maxSessionDuration:
type: string
default: "8h"
languages:
type: array
items:
type: string
warmPoolSize:
type: integer
format: int32
required:
- template
status:
type: object
properties:
conditions:
type: array
items:
type: object
properties:
type:
type: string
status:
type: string
reason:
type: string
message:
type: string
lastTransitionTime:
type: string
format: date-time
observedGeneration:
type: integer
format: int64
ready:
type: boolean
activeSessions:
type: integer
format: int32
warmPoolReady:
type: integer
format: int32
subresources:
status: {}
scope: Namespaced
names:
plural: codeinterpreters
singular: codeinterpreter
kind: CodeInterpreter
shortNames:
- ci

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This CRD definition for CodeInterpreter seems to be a duplicate of crds/runtime.agentcube.io_codeinterpreters.yaml. The other file is more detailed, includes descriptions, and seems to be the one generated by controller-gen. To avoid confusion and maintain a single source of truth for the CRD, this file should be removed in favor of the generated one.

Comment on lines 1 to 31
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: codeinterpreter-controller
rules:
- apiGroups: ["runtime.agentcube.io"]
resources: ["codeinterpreters"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["runtime.agentcube.io"]
resources: ["codeinterpreters/status"]
verbs: ["get", "update", "patch"]
- apiGroups: ["runtime.agentcube.io"]
resources: ["codeinterpreters/finalizers"]
verbs: ["update"]
- apiGroups: ["agents.x-k8s.io"]
resources: ["sandboxes"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: codeinterpreter-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: codeinterpreter-controller
subjects:
- kind: ServiceAccount
name: agentcube-apiserver
namespace: agentcube

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ClusterRole and ClusterRoleBinding defined in this file appear to be redundant. The permissions required by the code interpreter controller are already included in the agentcube-apiserver ClusterRole within k8s/agentcube-apiserver.yaml. To simplify RBAC management and avoid duplication, this file should be removed.

// Update status
ci.Status.ActiveSessions = activeSessions
ci.Status.WarmPoolReady = warmPoolReady
ci.Status.Ready = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Ready status is unconditionally set to true. This might not accurately reflect the state of the CodeInterpreter. For instance, if a warm pool is configured (WarmPoolSize > 0), the CodeInterpreter should probably only be considered ready when the warm pool has the desired number of ready sandboxes. Consider making this logic more robust by checking the status of dependent resources.

Comment on lines 128 to 154
func (r *CodeInterpreterReconciler) countActiveSandboxes(ctx context.Context, ci *runtimev1alpha1.CodeInterpreter) (int32, error) {
sandboxList := &sandboxv1alpha1.SandboxList{}
if err := r.List(ctx, sandboxList, client.InNamespace(ci.Namespace)); err != nil {
return 0, err
}

count := int32(0)
for _, sandbox := range sandboxList.Items {
// Check if sandbox is using this code interpreter runtime
// This is determined by labels or annotations
if sandbox.Labels != nil {
if runtimeName, ok := sandbox.Labels["codeinterpreter.runtime.agentcube.io/name"]; ok {
if runtimeName == ci.Name {
// Check if sandbox is running by checking Ready condition
for _, condition := range sandbox.Status.Conditions {
if condition.Type == string(sandboxv1alpha1.SandboxConditionReady) && condition.Status == metav1.ConditionTrue {
count++
break
}
}
}
}
}
}

return count, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Listing all sandboxes in the namespace and then filtering them on the client side is inefficient, especially in a cluster with many sandboxes. You can improve performance by using a label selector in the List call. This also simplifies the function body.

func (r *CodeInterpreterReconciler) countActiveSandboxes(ctx context.Context, ci *runtimev1alpha1.CodeInterpreter) (int32, error) {
	sandboxList := &sandboxv1alpha1.SandboxList{}
	listOpts := []client.ListOption{
		client.InNamespace(ci.Namespace),
		client.MatchingLabels(map[string]string{
			"codeinterpreter.runtime.agentcube.io/name": ci.Name,
		}),
	}
	if err := r.List(ctx, sandboxList, listOpts...); err != nil {
		return 0, err
	}

	var count int32
	for _, sandbox := range sandboxList.Items {
		// Check if sandbox is running by checking Ready condition
		for _, condition := range sandbox.Status.Conditions {
			if condition.Type == string(sandboxv1alpha1.SandboxConditionReady) && condition.Status == metav1.ConditionTrue {
				count++
				break
			}
		}
	}

	return count, nil
}

Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
@YaoZengzeng
Copy link
Member Author

@hzxuzhonghu ptal

@YaoZengzeng YaoZengzeng changed the title WIP: implement code interpreter and sandbox warmpool implement code interpreter and sandbox warmpool Nov 29, 2025
@hzxuzhonghu hzxuzhonghu changed the title implement code interpreter and sandbox warmpool implement code interpreter sandbox warmpool Nov 29, 2025
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking issue should be api group

runtime.agentcube.volcano.sh

And in order to integrate with workload manager's sandbox manager module, you should also provide a cache to support template or warmpool query


// Setup CodeInterpreter controller
if err := ctrl.NewControllerManagedBy(mgr).
For(&runtimev1alpha1.CodeInterpreter{}).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the workload manager later

// routers / UIs select a compatible runtime.
// +optional
Languages []string `json:"languages,omitempty" protobuf:"bytes,6,rep,name=languages"`
Languages []string `json:"languages,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this first. I am not thinking well about this


// Ready indicates whether the CodeInterpreter is ready to serve requests
// +optional
Ready bool `json:"ready,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems hard to set set it. When we have multi managers


// ActiveSessions is the number of active sandbox sessions using this runtime
// +optional
ActiveSessions int32 `json:"activeSessions,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove first, this may cause too frequently k8s update


// WarmPoolReady is the number of sandboxes ready in the warm pool
// +optional
WarmPoolReady int32 `json:"warmPoolReady,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}

// Ensure SandboxTemplate exists
if err := r.ensureSandboxTemplate(ctx, codeInterpreter); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a must if nbo warmpool required

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

}

// Requeue periodically to check status
return ctrl.Result{RequeueAfter: 1 * time.Minute}, nil
Copy link
Member

@hzxuzhonghu hzxuzhonghu Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a good way? Can we also watch warmpool, i am also concerned without watching, you still need get List some resources from kube-apiserver

Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
@YaoZengzeng
Copy link
Member Author

@hzxuzhonghu updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support warming pool

3 participants