-
Notifications
You must be signed in to change notification settings - Fork 674
[Feat][kubectl-plugin] Add shell completion for for kubectl ray get [workergroups|nodes] #4291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
01d8732 to
b7c557f
Compare
| // createRayNodeLabelSelectors creates a map of K8s label selectors for Ray nodes | ||
| // TODO: duplicated function as in kubectl/pkg/cmd/get/get_nodes.go | ||
| func createRayNodeLabelSelectors(cluster string) map[string]string { | ||
| labelSelectors := map[string]string{ | ||
| util.RayIsRayNodeLabelKey: "yes", | ||
| } | ||
| if cluster != "" { | ||
| labelSelectors[util.RayClusterLabelKey] = cluster | ||
| } | ||
| return labelSelectors | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to joinLabelMap, I grab this function from kubectl-plugin/pkg/cmd/get/get_nodes.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, I think it's ok to expose as a public function to be used in different places. Maybe we can put it under pkg/util then import by others? Would like to know how others think for this one
@seanlaii @troychiu @JiangJiaWei1103 @AndySung320
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that factoring this out into a shared helper makes sense.
The only open question might be what the most appropriate name / package would be, given that this is CLI-specific logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed there are many createRayxxxLabelSelectors functions, maybe we can extract these to a shared utility, such as pkg/util/selector.go.
This can be a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you for pointing this out. I agree this can be in a follow-up PR.
c202da3 to
6c671e9
Compare
|
cc @win5923 and @Future-Outlier to review or help redistribute review work. |
Future-Outlier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @seanlaii @troychiu @machichima @AndySung320 for help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds shell completion support for kubectl ray get workergroup and kubectl ray get node commands. The completion functions respect namespace filtering (-n/--namespace), cluster filtering (--ray-cluster), and disable completion when --all-namespaces is set, falling back to file completion in that case.
Key changes:
- Implemented
WorkerGroupCompletionFuncandNodeCompletionFuncto provide intelligent shell completion - Added comprehensive unit tests covering various scenarios including namespace filtering, cluster filtering, prefix matching, and edge cases
- Integrated the completion functions into the respective command definitions
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| kubectl-plugin/pkg/util/completion/completion.go | Implements worker group and node completion functions with namespace and cluster filtering support |
| kubectl-plugin/pkg/util/completion/completion_test.go | Adds comprehensive unit tests for both completion functions covering multiple scenarios |
| kubectl-plugin/pkg/cmd/get/get_workergroup.go | Integrates WorkerGroupCompletionFunc into the workergroup command |
| kubectl-plugin/pkg/cmd/get/get_nodes.go | Integrates NodeCompletionFunc into the node command |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| namespace = "default" | ||
| } | ||
|
|
||
| rayClusterList, err := k8sClient.RayClient().RayV1().RayClusters(namespace).List(context.Background(), v1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| rayClusterList, err := k8sClient.RayClient().RayV1().RayClusters(namespace).List(context.Background(), v1.ListOptions{}) | |
| ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond) | |
| defer cancel() | |
| rayClusterList, err := k8sClient.RayClient().RayV1().RayClusters(namespace).List(ctx, v1.ListOptions{}) |
Since this API call runs during shell completion, would it be safer to guard it with a context timeout ?
If the API server is slow or unreachable, blocking here could freeze the terminal, which can be a pretty rough UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a solid suggestion but the common pattern in kubectl-plugin/ seems to be using cmd.Context() which is actually context.Background(). I have yet to figure out why that is. I am open to this change but would like to hear about others thoughts on this one first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think kubectl default to no timeout and we can set it through --request-timeout (see docs). I think we can just follow the kubectl setting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, but I think shell completion is a special case here.
When a user types kubectl ray get workergroup <TAB>, the completion code runs in the background before they've finished typing the command. This means:
- The user can't pass
--request-timeoutbecause they haven't completed the command yet. - Users generally expect tab completion to be instant, or to fail quickly, especially if the API server is slow or unreachable.
Because of that, it seems safer for the completion path to guard itself with a short timeout, independent of the normal kubectl --request-timeout behavior for command execution.
Happy to hear your thoughts if there's an established pattern I'm missing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense! I think the normal kubectl command have no timeout, so sometimes using auto completion will block terminal for a while. I think having a timeout here would be great. But I got a few concern here:
- The timeout we set here will only be applied to querying ray cluster. There's still no timeout for auto completing namespaces (as this is provided by native kubectl), there's a bit inconsistency here
- After timeout, what shall we show to the user? When using normal kubectl command, if there's nothing shown when trying to do auto completion, it can be either: 1) the auto completion plugin not installed by user; or 2) the resource not found. I'm just wondering if it will provide any confusion to the user?
Generally I think adding timeout here is a good idea, but just want to prevent any confusion users might have.
cc @seanlaii @troychiu @JiangJiaWei1103 for some ideas here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think we can follow the kubectl-plugin pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find! However, I think the reason why AttachablePodForObjectFn have timeout is because it's watching the pods to meet the specific condition (code path: AttachablePodForObjectFn -> attachablePodForObject -> GetFirstPod -> GetPodList here)
For command like kubectl get pods, they calls resourceTypeAndNameCompletionFunc here, which do not have timeout set.
In this case I think no timeout here aligns with the kubectl get commands behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow-up! This is a great find, and it made me dig a little bit deeper.
I think the two scenarios are different after tracing the code path. ref file
The code path that I traced:
ResourceAndPortCompletionFunc | ContainerCompletionFunc
-> convertResourceNameToPodName (100ms timeout)
-> AttachablePodForObjectFn
Here's why,
AttachablePodForObjectFn is used because convertResourceNameToPodName needs to convert a resource reference to a single running pod name. However, after the conversion, kubectl always calls actual resource listing functions like CompGetContainers, CompGetPodContainerPorts, etc with no timeout.
For example,
ContainerCompletionFunc
->convertResourceNameToPodName (100ms timeout) finds the pod
-> CompGetContainers (no timeout) lists containers in that pod
we're directly listing all resources just asCompGetResource does so following what they do seems more reasonable to me. Also, for larger clusters with many resources, this might be slow, and we should avoid false negatives that would confuse users.
Would love to discuss further if you see it differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for the deep dive and tracing the code paths. This provides great clarity on why kubectl applies timeouts selectively.
I agree that staying consistent with the native CompGetResource behavior makes sense here to avoid false negatives and maintain a predictable user experience.
I think we could proceed without the timeout for now. If we receive feedback regarding terminal hangs in high-latency clusters in the future, we can revisit more advanced completion strategies then.
Thanks again for the great discussion~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, thanks for the thorough review and discussion!
| } | ||
|
|
||
| labelSelectors := createRayNodeLabelSelectors(cluster) | ||
| pods, err := k8sClient.KubernetesClient().CoreV1().Pods(namespace).List( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern applies here as well regarding using a context timeout for this API call during shell completion.
|
cc @JiangJiaWei1103 to take a look, thank you! |
| namespace = "default" | ||
| } | ||
|
|
||
| rayClusterList, err := k8sClient.RayClient().RayV1().RayClusters(namespace).List(context.Background(), v1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When listing RayCluster, I think we can provide the cluster name in the list options? So that in the following loop we do not need to use:
if cluster != "" && rayCluster.Name != cluster {
continue
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I'll update it to use FieldSelector in the list option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the fix, I found an issue with the fake client implementation.
NewSimpleClientset and its successor NewClientset do not support FieldSelector filtering in List operations. To make the unit tests pass, the client-side rayCluster.Name check is still needed as a fallback.
Also, NewSimpleClientset is deprecated in favor of NewClientset for better server-side apply testing. A future PR could migrate all usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found this related issue in kubernetes: kubernetes/kubernetes#78824. They suggest using reactors. We can do it in the follow-up PR.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome find! Thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add an issue for this follow up. Thanks!
5e1e1b0 to
9a85485
Compare
Future-Outlier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @win5923 to take a look, thank you!
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
…nFunc to accept client.Client parameter for testability. Add initial unit tests. Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: JustinYeh <justinyeh1995@gmail.com>
…workergroup deduplication Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
83d6ccd to
9ac0ee3
Compare
Signed-off-by: justinyeh1995 <justinyeh1995@gmail.com>
Future-Outlier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @justinyeh1995
I can't reproduce your result on my zsh, can you give me a full guide so that I can just copy paste and test this PR on my laptop?
Sure. I'll check again and write a follow-up here. Thanks for reviewing. |
|
Hi @Future-Outlier, would you mind trying this out in zsh pushd kubectl-plugin > /dev/null
go build cmd/kubectl-ray.go
mv kubectl-ray ~/.krew/bin
popd > /dev/null
kubectl krew install --manifest-url https://raw.githubusercontent.com/marckhouzam/kubectl-plugin_completion/v0.1.0/plugin-completion.yaml
kubectl plugin-completion generate
export PATH=$PATH:$HOME/.kubectl-plugin-completion
source ~/.zshrcShould be able to verify it via: kubectl ray get workergroup <TAB>
kubectl ray get node <TAB>Thanks! |
Future-Outlier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machichima
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you


Why are these changes needed?
Add shell completion for two specific commands,
kubectl ray get workergroup&kubectl ray get node.These expected behaviors are as followed.
If no resource is fetched, kubectl will fall back to the default completion behavior showing directories.
The same goes for
kubectl ray get node.Demo video
Screen.Recording.2025-12-20.at.4.12.28.PM.mov
Manual Testing Steps
From the KubeRay repository root:
Related issue number
Closes #3051
Checks