-
Notifications
You must be signed in to change notification settings - Fork 1.5k
add sagemaker-hyperpod compute type to resolve its pods via VPC ENI #3886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sagemaker-hyperpod compute type to resolve its pods via VPC ENI #3886
Conversation
Welcome @amber-liu-amzn! |
Hi @amber-liu-amzn. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/test pull-aws-load-balancer-controller-e2e-test |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to validate in an environment that has EC2, Fargate, and sagemaker-hyperpod compute to ensure everything works?
// classifyPodsByComputeType classifies in to ec2 and fargate groups | ||
func (r *defaultPodENIInfoResolver) classifyPodsByComputeType(ctx context.Context, pods []k8s.PodInfo) ([]k8s.PodInfo, []k8s.PodInfo, error) { | ||
// classifyPodsByComputeType classifies in to ec2, fargate and sagemaker-hyperpod groups | ||
func (r *defaultPodENIInfoResolver) classifyPodsByComputeType(ctx context.Context, pods []k8s.PodInfo) ([]k8s.PodInfo, []k8s.PodInfo, []k8s.PodInfo, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refactor this to be more type friendly? E.g. it's non-trivial to remember arg1 is ec2, arg2 is fargate, arg3 is hyperpod.
Can you introduce a new type that basically is
struct{
ec2Pods []k8s.PodInfo
fargatePods []k8s.PodInfo
hyperpodPod []k8s.PodInfo
}
this will make the code cleaner and as we add more compute types it would be more extensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. refactored in the last commit. thanks!
Good question! I will try to set up such mixed cluster and verify. will get back to you on this |
/lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: amber-liu-amzn, shraddhabang, zac-nixon The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issue
Sagemaker HyperPod offers service-managed Kubernetes nodes accessible from customer accounts. Using aws-load-balancer-controller in HyperPod EKS clusters is not supported today, because nodes are in SageMaker VPC while load balancers will be in customer VPC.
Why is it not working today for routing traffic directly to the HyperPod's pod IP?
SageMaker HyperPod pods are in a different VPC, but LBC incorrectly maps them as EC2 pods in customer VPC, leading to incorrect ENI info retrieval and missing security group permissions.
Description
To enable IP target mode for pods running on SageMaker HyperPod, this PR is to add sagemaker-hyperpod as a new compute type to resolve its pods via VPC ENI.
Checklist
README.md
, or thedocs
directory)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯