Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UI] Now show-gpus shows the combined price for GCP GPU instance #2946

Merged
merged 17 commits into from
Jan 11, 2024

Conversation

Vaibhav2001
Copy link
Contributor

@Vaibhav2001 Vaibhav2001 commented Jan 5, 2024

This is in relation to #2039.

Added functionality to show the combined price for GCP GPU instance.

GPU   QTY  CLOUD   INSTANCE_TYPE       DEVICE_MEM  vCPUs  HOST_MEM  HOURLY_PRICE  HOURLY_SPOT_PRICE  REGION
V100  1    AWS     p3.2xlarge          16GB        8      61GB      $ 3.060       $ 0.476            ap-northeast-1
V100  4    AWS     p3.8xlarge          16GB        32     244GB     $ 12.240      $ 1.596            ap-northeast-1
V100  8    AWS     p3.16xlarge         16GB        64     488GB     $ 24.480      $ 2.678            ap-northeast-1
V100  1    Azure   Standard_NC6s_v3    -           6      112GB     $ 3.060       $ 0.450            centralus
V100  2    Azure   Standard_NC12s_v3   -           12     224GB     $ 6.120       $ 0.899            centralus
V100  4    Azure   Standard_NC24rs_v3  -           24     448GB     $ 13.460      $ 1.978            centralus
V100  4    Azure   Standard_NC24s_v3   -           24     448GB     $ 12.240      $ 1.798            centralus
V100  1    GCP     n1-highmem-8        -           8      52GB      $ 2.953       $ 0.811            asia-east1
V100  2    GCP     n1-highmem-16       -           16     104GB     $ 5.906       $ 1.622            asia-east1
V100  4    GCP     n1-highmem-32       -           32     208GB     $ 11.813      $ 3.244            asia-east1
V100  8    GCP     n1-highmem-64       -           64     416GB     $ 23.626      $ 6.487            asia-east1
V100  1    IBM     gx2-8x64x1v100      -           8      64GB      $ 2.497       -                  au-syd
V100  8    Lambda  gpu_8x_v100         16GB        92     448GB     $ 4.400       -                  asia-northeast-1
V100  1    OCI     VM.GPU3.1           16GB        12     90GB      $ 2.950       -                  af-johannesburg-1
V100  2    OCI     VM.GPU3.2           16GB        24     180GB     $ 5.900       -                  af-johannesburg-1
V100  4    OCI     VM.GPU3.4           16GB        48     360GB     $ 11.800      -                  af-johannesburg-1
V100  8    OCI     BM.GPU3.8           16GB        104    768GB     $ 23.600      -                  af-johannesburg-1

To test:

  • shows the correct added values for various gpus. (manually checked for 7 gpus)

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Vaibhav2001 for this amazing feature! 🚀 It will be really helpful to compare the prices across clouds. Left some comments and discussions 🫡

sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
@Vaibhav2001
Copy link
Contributor Author

Vaibhav2001 commented Jan 6, 2024

Also currently I am failing https://github.com/skypilot-org/skypilot/actions/runs/7429948668/job/20219105846?pr=2946. I do not understand why it fails. Essentially from my understanding I am assigning an instance type to a TPU VM and that is wrong, but why? Do we not want to attach an instance to the tpu accelerator types?

Also @cblmemo if you can advice on what to do for the mypy issue. I can just add that check I think?

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt fix!! Left some more comments 🫡

sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Show resolved Hide resolved
@cblmemo
Copy link
Collaborator

cblmemo commented Jan 8, 2024

Also currently I am failing https://github.com/skypilot-org/skypilot/actions/runs/7429948668/job/20219105846?pr=2946. I do not understand why it fails. Essentially from my understanding I am assigning an instance type to a TPU VM and that is wrong, but why? Do we not want to attach an instance to the tpu accelerator types?

Also @cblmemo if you can advice on what to do for the mypy issue. I can just add that check I think?

Hi @Vaibhav2001 ! Thanks for the discussion 🙂

For the first question, that is because TPU has two types in GCP: TPU VMs and TPU Nodes. You can refer to this doc for more information. A TPU Node will be attached to an instance, whereas a TPU VM is a stand-alone instance so it cannot have an instance type. For the pricing here, a TPU VM's price doesn't need to add the price for an instance.

For the second mypy question, from the error message shown in the link, it is because vm_types, which is the first return value of get_instance_type_for_accelerator, is Optional[List[str]], and you are iterating it in the following code. You can see it in the docstr of the get_instance_type_for_accelerator

def get_instance_type_for_accelerator(
acc_name: str,
acc_count: int,
cpus: Optional[str] = None,
memory: Optional[str] = None,
use_spot: bool = False,
region: Optional[str] = None,
zone: Optional[str] = None) -> Tuple[Optional[List[str]], List[str]]:
"""Fetch instance types with similar CPU count for given accelerator.
Return: a list with a single matched instance type and a list of candidates
with fuzzy search (should be empty as it must have already been generated in
caller).
"""

In this case, we can investigate in what cases will the return value be None, and assert vm_types is not None if you can confirm that there is no chance that it will be None (remember to add a comment for why it is true 😉).

@Vaibhav2001
Copy link
Contributor Author

Thanks for the prompt fix!! Left some more comments 🫡

@cblmemo Thanks for these comments. I addressed them and pushed the new version.

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt fix!!! 🚀 Left some discussions ;)

sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
Vaibhav2001 and others added 3 commits January 9, 2024 13:56
fixing comment

Co-authored-by: Tian Xia <cblmemo@gmail.com>
Co-authored-by: Tian Xia <cblmemo@gmail.com>
@Vaibhav2001
Copy link
Contributor Author

@cblmemo. Addressed all comments! :)

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt fix!! Some final comments 🚀 It looks mostly ready to me!

sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Show resolved Hide resolved
sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
Vaibhav2001 and others added 2 commits January 10, 2024 14:20
Co-authored-by: Tian Xia <cblmemo@gmail.com>
@Vaibhav2001
Copy link
Contributor Author

Thanks for the prompt fix!! Some final comments 🚀 It looks mostly ready to me!

Addressed!

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check that all tests still works fine; otherwise LGTM!

sky/clouds/service_catalog/gcp_catalog.py Outdated Show resolved Hide resolved
@Vaibhav2001
Copy link
Contributor Author

@cblmemo All the tests seem to be working fine! Should be good to ship now!

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this exciting feature again!! 🚀🙌🏻

@cblmemo cblmemo merged commit 1a711db into skypilot-org:master Jan 11, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants