-
Notifications
You must be signed in to change notification settings - Fork 13
implement code interpreter sandbox warmpool #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary of ChangesHello @YaoZengzeng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly extends the system's capabilities by introducing a dedicated Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new CodeInterpreter CRD and its associated controller. This is a significant feature addition that allows defining and managing code interpreter runtimes, including warm pools for faster startup. The changes include new API types, a controller, code generation scripts, and Kubernetes manifests.
Overall, the implementation follows standard practices for Kubernetes controllers. The use of controller-runtime and code generation is appropriate. The new CodeInterpreter CRD is well-defined.
I've identified a few areas for improvement:
- There are some redundant or unnecessary files (
crds/_.yaml,crds/codeinterpreter.yaml,k8s/codeinterpreter-rbac.yaml) that should be cleaned up. - The CRD definition has a minor issue with
Requiredanddefaultmarkers that could cause confusion. - The controller logic has opportunities for improvement in terms of correctness (status reporting, template comparison) and efficiency (listing resources).
My detailed comments provide specific suggestions for these points. Addressing them will make the implementation more robust and maintainable.
| // +kubebuilder:validation:Required | ||
| // +kubebuilder:default="15m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SessionTimeout field is marked with +kubebuilder:validation:Required but also has a +kubebuilder:default value. This combination can lead to confusing behavior, as client-side validation (like kubectl apply) may fail if this field is omitted, even though the API server would apply a default. To make this field truly optional with a default value, the Required validation marker should be removed.
| // +kubebuilder:validation:Required | |
| // +kubebuilder:default="15m" | |
| // +kubebuilder:default="15m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
| // +kubebuilder:validation:Required | ||
| // +kubebuilder:default="8h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to SessionTimeout, the MaxSessionDuration field is marked as Required but also has a default. This can cause validation issues. The +kubebuilder:validation:Required marker should be removed to make it an optional field with a default value.
| // +kubebuilder:validation:Required | |
| // +kubebuilder:default="8h" | |
| // +kubebuilder:default="8h" |
| func (r *CodeInterpreterReconciler) podTemplateEqual(a, b sandboxv1alpha1.PodTemplate) bool { | ||
| // Simple comparison - in production, you might want a more sophisticated comparison | ||
| if len(a.Spec.Containers) == 0 || len(b.Spec.Containers) == 0 { | ||
| return len(a.Spec.Containers) == len(b.Spec.Containers) | ||
| } | ||
| if a.Spec.Containers[0].Image != b.Spec.Containers[0].Image { | ||
| return false | ||
| } | ||
| if (a.Spec.RuntimeClassName == nil) != (b.Spec.RuntimeClassName == nil) { | ||
| return false | ||
| } | ||
| if a.Spec.RuntimeClassName != nil && b.Spec.RuntimeClassName != nil { | ||
| return *a.Spec.RuntimeClassName == *b.Spec.RuntimeClassName | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The podTemplateEqual function performs a very basic comparison and, as the comment notes, is not robust. It only checks the container image and runtime class name. This will fail to detect changes to other important fields like command, args, environment variables, or resource requests/limits. This could lead to the SandboxTemplate not being updated when the CodeInterpreter spec changes. You should use a more comprehensive comparison, such as reflect.DeepEqual, to ensure all changes are detected. You will need to import the reflect package.
func (r *CodeInterpreterReconciler) podTemplateEqual(a, b sandboxv1alpha1.PodTemplate) bool {
// Use reflect.DeepEqual for a comprehensive comparison.
return reflect.DeepEqual(a.Spec, b.Spec)
}
crds/_.yaml
Outdated
| --- | ||
| apiVersion: apiextensions.k8s.io/v1 | ||
| kind: CustomResourceDefinition | ||
| metadata: | ||
| annotations: | ||
| controller-gen.kubebuilder.io/version: v0.17.2 | ||
| spec: | ||
| group: "" | ||
| names: | ||
| kind: "" | ||
| plural: "" | ||
| scope: "" | ||
| versions: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crds/codeinterpreter.yaml
Outdated
| apiVersion: apiextensions.k8s.io/v1 | ||
| kind: CustomResourceDefinition | ||
| metadata: | ||
| name: codeinterpreters.runtime.agentcube.io | ||
| labels: | ||
| app: agentcube | ||
| spec: | ||
| group: runtime.agentcube.io | ||
| versions: | ||
| - name: v1alpha1 | ||
| served: true | ||
| storage: true | ||
| schema: | ||
| openAPIV3Schema: | ||
| type: object | ||
| properties: | ||
| spec: | ||
| type: object | ||
| properties: | ||
| ports: | ||
| type: array | ||
| items: | ||
| type: object | ||
| properties: | ||
| pathPrefix: | ||
| type: string | ||
| name: | ||
| type: string | ||
| port: | ||
| type: integer | ||
| format: int32 | ||
| protocol: | ||
| type: string | ||
| enum: | ||
| - HTTP | ||
| - HTTPS | ||
| default: HTTP | ||
| required: | ||
| - port | ||
| - protocol | ||
| template: | ||
| type: object | ||
| required: | ||
| - image | ||
| properties: | ||
| labels: | ||
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| annotations: | ||
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| runtimeClassName: | ||
| type: string | ||
| image: | ||
| type: string | ||
| environment: | ||
| type: array | ||
| items: | ||
| type: object | ||
| properties: | ||
| name: | ||
| type: string | ||
| value: | ||
| type: string | ||
| valueFrom: | ||
| type: object | ||
| required: | ||
| - name | ||
| command: | ||
| type: array | ||
| items: | ||
| type: string | ||
| args: | ||
| type: array | ||
| items: | ||
| type: string | ||
| resources: | ||
| type: object | ||
| properties: | ||
| limits: | ||
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| requests: | ||
| type: object | ||
| additionalProperties: | ||
| type: string | ||
| sessionTimeout: | ||
| type: string | ||
| default: "15m" | ||
| maxSessionDuration: | ||
| type: string | ||
| default: "8h" | ||
| languages: | ||
| type: array | ||
| items: | ||
| type: string | ||
| warmPoolSize: | ||
| type: integer | ||
| format: int32 | ||
| required: | ||
| - template | ||
| status: | ||
| type: object | ||
| properties: | ||
| conditions: | ||
| type: array | ||
| items: | ||
| type: object | ||
| properties: | ||
| type: | ||
| type: string | ||
| status: | ||
| type: string | ||
| reason: | ||
| type: string | ||
| message: | ||
| type: string | ||
| lastTransitionTime: | ||
| type: string | ||
| format: date-time | ||
| observedGeneration: | ||
| type: integer | ||
| format: int64 | ||
| ready: | ||
| type: boolean | ||
| activeSessions: | ||
| type: integer | ||
| format: int32 | ||
| warmPoolReady: | ||
| type: integer | ||
| format: int32 | ||
| subresources: | ||
| status: {} | ||
| scope: Namespaced | ||
| names: | ||
| plural: codeinterpreters | ||
| singular: codeinterpreter | ||
| kind: CodeInterpreter | ||
| shortNames: | ||
| - ci | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This CRD definition for CodeInterpreter seems to be a duplicate of crds/runtime.agentcube.io_codeinterpreters.yaml. The other file is more detailed, includes descriptions, and seems to be the one generated by controller-gen. To avoid confusion and maintain a single source of truth for the CRD, this file should be removed in favor of the generated one.
k8s/codeinterpreter-rbac.yaml
Outdated
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: ClusterRole | ||
| metadata: | ||
| name: codeinterpreter-controller | ||
| rules: | ||
| - apiGroups: ["runtime.agentcube.io"] | ||
| resources: ["codeinterpreters"] | ||
| verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] | ||
| - apiGroups: ["runtime.agentcube.io"] | ||
| resources: ["codeinterpreters/status"] | ||
| verbs: ["get", "update", "patch"] | ||
| - apiGroups: ["runtime.agentcube.io"] | ||
| resources: ["codeinterpreters/finalizers"] | ||
| verbs: ["update"] | ||
| - apiGroups: ["agents.x-k8s.io"] | ||
| resources: ["sandboxes"] | ||
| verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] | ||
| --- | ||
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: ClusterRoleBinding | ||
| metadata: | ||
| name: codeinterpreter-controller | ||
| roleRef: | ||
| apiGroup: rbac.authorization.k8s.io | ||
| kind: ClusterRole | ||
| name: codeinterpreter-controller | ||
| subjects: | ||
| - kind: ServiceAccount | ||
| name: agentcube-apiserver | ||
| namespace: agentcube | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClusterRole and ClusterRoleBinding defined in this file appear to be redundant. The permissions required by the code interpreter controller are already included in the agentcube-apiserver ClusterRole within k8s/agentcube-apiserver.yaml. To simplify RBAC management and avoid duplication, this file should be removed.
| // Update status | ||
| ci.Status.ActiveSessions = activeSessions | ||
| ci.Status.WarmPoolReady = warmPoolReady | ||
| ci.Status.Ready = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ready status is unconditionally set to true. This might not accurately reflect the state of the CodeInterpreter. For instance, if a warm pool is configured (WarmPoolSize > 0), the CodeInterpreter should probably only be considered ready when the warm pool has the desired number of ready sandboxes. Consider making this logic more robust by checking the status of dependent resources.
| func (r *CodeInterpreterReconciler) countActiveSandboxes(ctx context.Context, ci *runtimev1alpha1.CodeInterpreter) (int32, error) { | ||
| sandboxList := &sandboxv1alpha1.SandboxList{} | ||
| if err := r.List(ctx, sandboxList, client.InNamespace(ci.Namespace)); err != nil { | ||
| return 0, err | ||
| } | ||
|
|
||
| count := int32(0) | ||
| for _, sandbox := range sandboxList.Items { | ||
| // Check if sandbox is using this code interpreter runtime | ||
| // This is determined by labels or annotations | ||
| if sandbox.Labels != nil { | ||
| if runtimeName, ok := sandbox.Labels["codeinterpreter.runtime.agentcube.io/name"]; ok { | ||
| if runtimeName == ci.Name { | ||
| // Check if sandbox is running by checking Ready condition | ||
| for _, condition := range sandbox.Status.Conditions { | ||
| if condition.Type == string(sandboxv1alpha1.SandboxConditionReady) && condition.Status == metav1.ConditionTrue { | ||
| count++ | ||
| break | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return count, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listing all sandboxes in the namespace and then filtering them on the client side is inefficient, especially in a cluster with many sandboxes. You can improve performance by using a label selector in the List call. This also simplifies the function body.
func (r *CodeInterpreterReconciler) countActiveSandboxes(ctx context.Context, ci *runtimev1alpha1.CodeInterpreter) (int32, error) {
sandboxList := &sandboxv1alpha1.SandboxList{}
listOpts := []client.ListOption{
client.InNamespace(ci.Namespace),
client.MatchingLabels(map[string]string{
"codeinterpreter.runtime.agentcube.io/name": ci.Name,
}),
}
if err := r.List(ctx, sandboxList, listOpts...); err != nil {
return 0, err
}
var count int32
for _, sandbox := range sandboxList.Items {
// Check if sandbox is running by checking Ready condition
for _, condition := range sandbox.Status.Conditions {
if condition.Type == string(sandboxv1alpha1.SandboxConditionReady) && condition.Status == metav1.ConditionTrue {
count++
break
}
}
}
return count, nil
}Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
|
@hzxuzhonghu ptal |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking issue should be api group
runtime.agentcube.volcano.sh
And in order to integrate with workload manager's sandbox manager module, you should also provide a cache to support template or warmpool query
|
|
||
| // Setup CodeInterpreter controller | ||
| if err := ctrl.NewControllerManagedBy(mgr). | ||
| For(&runtimev1alpha1.CodeInterpreter{}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in the workload manager later
| // routers / UIs select a compatible runtime. | ||
| // +optional | ||
| Languages []string `json:"languages,omitempty" protobuf:"bytes,6,rep,name=languages"` | ||
| Languages []string `json:"languages,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this first. I am not thinking well about this
|
|
||
| // Ready indicates whether the CodeInterpreter is ready to serve requests | ||
| // +optional | ||
| Ready bool `json:"ready,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems hard to set set it. When we have multi managers
|
|
||
| // ActiveSessions is the number of active sandbox sessions using this runtime | ||
| // +optional | ||
| ActiveSessions int32 `json:"activeSessions,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove first, this may cause too frequently k8s update
|
|
||
| // WarmPoolReady is the number of sandboxes ready in the warm pool | ||
| // +optional | ||
| WarmPoolReady int32 `json:"warmPoolReady,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| } | ||
|
|
||
| // Ensure SandboxTemplate exists | ||
| if err := r.ensureSandboxTemplate(ctx, codeInterpreter); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a must if nbo warmpool required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
| } | ||
|
|
||
| // Requeue periodically to check status | ||
| return ctrl.Result{RequeueAfter: 1 * time.Minute}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good way? Can we also watch warmpool, i am also concerned without watching, you still need get List some resources from kube-apiserver
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
Signed-off-by: YaoZengzeng <yaozengzeng@huawei.com>
|
@hzxuzhonghu updated |
fixes: #30