-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Train][Data] Add heterogeneous cluster configurations for training ingest benchmarks #60458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Train][Data] Add heterogeneous cluster configurations for training ingest benchmarks #60458
Conversation
Add fixed-size and autoscaling configurations with CPU worker nodes for data loading alongside GPU worker nodes for training. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces new benchmark configurations for heterogeneous clusters, including fixed-size and autoscaling variants. These new configurations will be valuable for benchmarking Ray Data operations alongside GPU worker nodes for training. However, there is a mismatch between the number of workers requested in the benchmark script and the maximum number of workers defined in the new cluster compute configurations.
justinvyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, need to update these since this landed: #60414
Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
Signed-off-by: Xinyuan <43737116+xinyuangui2@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Summary
Cluster Configurations
Fixed-size (
heterogenous_fixed_size_gpu_4x4_aws.yaml):Autoscaling (
heterogenous_autoscaling_gpu_4x4_aws.yaml):Release test