-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Resources] Add cpus
in resource specification
#1622
Conversation
cpu
in resourcescpu
in resource specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @WoosukKwon! Had a quick scan and the approach makes sense to me.
Quick question: are we deferring adding CPU as a scheduling constraint?
E.g., what happens if we launch a node with 8 CPUs, then keep sky launch -c existing --cpu 4
for many times? Or, for --cpu 16
do we correctly throw an error?
(I think it's okay to defer, just checking.)
@concretevitamin Thanks for the quick feedback. Yes, I'd like to defer it. |
Love it!! Two questions:
|
Hey @dongreenberg, thanks for your questions!
Yes. It will work on the Python API in the same way.
They all have recent Intel x86-64 CPUs (i.e., either Ice Lake or Cascade Lake CPUs), and they all have 4 GB RAM per 1 vCPU. More specifically,
The OS might be different across the clouds and the VM images you select. |
Amazing, that's perfect. You guys are the best! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice @WoosukKwon. Did a pass and left some comments.
-
Tests: consider spelling out the manual tests. Also, should we add some dryrun unit tests to make sure
sky.optimize()
works for things likesky.Resources(..., cpu=...)
? -
Should we warn + ignore
--cpu
insky launch -c existing --cpu x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @WoosukKwon. Some final comments & reminder to add some tests to maybe test_optimizer_dryruns.py
.
# 2. Change the __setstate__ method to handle the new fields. | ||
# 3. Modify the to_config method to handle the new fields. | ||
# If any fields changed, increment the version. For backward compatibility, | ||
# modify the __setstate__ method to handle the old version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Was thinking this comment is a guide for "all the places we need to change when we add a new field". For just back compat, it seems __setstate__
is the only required change. cc @Michaelvll to double check
return (_make([AWS.get_default_instance_type()]), | ||
fuzzy_candidate_list) | ||
# Return a default instance type with the given number of vCPUs. | ||
default_instance_type = AWS.get_default_instance_type( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind clarifying what you meant?
Currently, the Cloud interface includes a shallow method, which does not show the default type:
@classmethod
def get_default_instance_type(cls,
cpu: Optional[str] = None) -> Optional[str]:
return service_catalog.get_default_instance_type(cpu=cpu, clouds='aws')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome contribution @WoosukKwon. A major update to our Resources in a long time. LGTM.
PS: consider some basic smoke tests (e.g., minimal) before merging?
@concretevitamin This PR has passed all the smoke tests. |
Missed this before: shall we add the new field to https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html? |
Good catch. Added. |
Woo!!! Congrats and thanks gang!! Can't wait to test it out! |
Closes #387
This PR introduces the ability for users to specify resource requirements in terms of the number of vCPUs in the
Resources
class and theresources
section in the SkyPilot yaml.Semantics
cpus
parameter to select an appropriate VM size from the cloud's default VM family (i.e., m6i for AWS, Dv5 for Azure, and n2-standard for GCP).For example,
sky launch --cpus 8
orResources(cpus=8)
: SkyPilot will find the VMs that have exactly 8 vCPUs. Namely,sky launch --cpus 12+
orResources(cpus='12+')
: SkyPilot will consider the VMs that have 12 or more vCPUs. Namely,(Note: The selected three instance types have 16 vCPUs instead of 12 vCPUs, because the default instance families of the clouds do not include any VM with 12-15 vCPUs.)
cpus
parameter to choose the VM of the right size in the instance family. For example,sky launch --gpus T4 --cpus 8+
orResources(accelerators='T4', cpus='8+')
: SkyPilot will choose VMs that have 1 T4 GPU along with 8 or more vCPUs. Namely,The same semantics applies to
cpunode
,gpunode
,tpunode
, andspot launch
.Tested (run the relevant ones):
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh