feat(catalog): Add RunPod data fetcher #5930

pirtleshell · 2025-06-09T21:39:34Z

Adds a new data fetcher script to automatically generate RunPod instance catalog data. The script queries the RunPod API to fetch GPU types, pricing, and availability information.

Verified all previously existing Accelerator Names are in the newly generated CSV.
Additionally, includes support for previously missing GPUs:
- L40S (manually added in feat: add L40S to available RunPod GPUs skypilot-catalog#131)
- RTX 2000 Ada (RTX2000-Ada)
- RTX 5090 (RTX5090)
- RTX PRO 6000 (RTXPRO6000)
Includes all available quantities of GPUs (previously did not include all permutations)

The script requires a RunPod API key with read access and generates a CSV file compatible with SkyPilot's catalog system.

For each GPU, a CSV catalog entry is created for every region (hardcoded list of regions) for every available quantityFor example, the api returns availableGpuCounts like [1,2,3,4] to indicate you can request an instance with up to 4 GPUs. Thus, the GPU is listed in the catalog 100 times: gpu quantity options (4) * number of regions (25)

Verification

For testing, I locally generated the catalog CSV:

$ RUNPOD_API_KEY=read-only-api-key python sky/catalog/data_fetchers/fetch_runpod.py --output-dir temp
RunPod Service Catalog saved to temp/vms.csv

Then I compared all the generated CSVs accelerator names to the ones in the existing v7 catalog CSV:

$ diff <(awk -F, '{print $2}' ../skypilot-catalog/catalogs/v7/runpod/vms.csv | sort | uniq) <(awk -F, '{print $2}' temp/vms.csv | sort | uniq)

13a14
> RTX2000-Ada
16a18
> RTX5090
21a24
> RTXPRO6000

Thus, all originally available accelerators are available (with an unchanged name) plus three more (RTX2000-Ada, RTX5090, RTXPRO6000).

For a final gut check, I manually compared lines for some specific quantity-GPU-region tuples.

Besides expected fluctuation of prices, there are some differences in the vCPU and MemoryGiB columns. It's unclear to me why they are different. To the best of my knowledge, the values coming from the API and used by this script are correct.

Example difference:

# 4x NVIDIA L40 in US-TX-3
before: 4x_L40_SECURE,L40,4.0,64.0,192.0,L40,US,2.76,4.56,US-TX-3
 after: 4x_L40_SECURE,L40,4.0,32.0,376.0,L40,US,3.96,2.0,US-TX-3

Previously, L40 was listed as having 64vCPUs and 192GiB RAM. I confirmed in the API & UI of RunPod that the vCPUs and memory for an instance with 4x L40s matches the newly generated values:

# from runpod's deploy UI
4x L40 (192 GB VRAM)
376 GB RAM • 32 vCPU

Similar differences can be seen in other GPUs, like the A100.

Tested (run the relevant ones):

Code formatting: install pre-commit (auto-check on commit) or bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

Adds a new data fetcher script to automatically generate RunPod instance catalog data. The script queries the RunPod API to fetch GPU types, pricing, and availability information. - Verified all previously existing Accelerator Names are in the newly generated CSV. - Added catalog entries for previously missing GPUs: * L40S (manually added in skypilot-org/skypilot-catalog#131) * RTX 2000 Ada (RTX2000-Ada) * RTX 5090 (RTX5090) * RTX PRO 6000 (RTXPRO6000) - Includes all available quantities of GPUs (previously did not include all permutations) The script requires a RunPod API key with read access and generates a CSV file compatible with SkyPilot's catalog system.

pirtleshell · 2025-06-09T21:48:56Z

I've uploaded a copy of the full csv generated by this script here: https://gist.github.com/pirtleshell/41079b4c9752a16a60c3dbc45164e7c6

Adds a workflow based on the one for AWS that automates the updating of RunPod GPU pricing and availability Depends on skypilot-org/skypilot#5930

adocherty · 2025-06-11T06:53:48Z

sky/catalog/data_fetchers/fetch_runpod.py

+
+# Mapping of regions to their availability zones
+REGION_ZONES = {
+    'CA': ['CA-MTL-1', 'CA-MTL-2', 'CA-MTL-3'],


Is there a reason to use SkyPilot to cycle through all availability zones rather than not specifying it in the RunPod API and letting RunPod assign the zone automatically?

To add community cloud support it seems cycling through the zones doesn't work whereas letting RunPod choose them does:
#3441 (comment)

Also, There seems to be another PR with a different version of the data fetcher that does include community cloud support:
#5929

hello @adocherty 👋

i'm not sure by what serendipity of the universe caused these two PRs for the same feature to appear at once! haha, but it definitely shows it's time for automated availability & price updates for RunPod instances 😄

no hard feelings if #5929 is merged instead of this. i'll state my original intention: generate the existing catalog CSV as closely as possible. i believe it's easier to alter or add functionality after the existing functionality is automated. i don't have a complete and deep understanding of the workings of skypilot so i did my best to create the existing CSV with minimal new functionality (no community clouds, no changes to how regions are managed, etc). if we want to take on those changes with this script, i'm happy to assist.

as for region management, it was again just a focused attempt at creating a near-identical CSV to the one that currently exists without diving into how the CSV is used by SkyPilot to discover instances.

Hi @pirtleshell

It is quite serendipitous! I had actually drafted some code to do the same thing, then came here to find not one but two PRs with the same functionality!

I sounds a sensible strategy to get existing functionality in place first, then add features. I have an interest in getting community cloud pods enabled, but have no horses in the race between this and #5929. They both are a good step forward.

I'm also new to SkyPilot as well, and my questions were all about me understanding more about how things work under the hood. That being said, I'm happy to help you getting this PR over the line if you need a hand - although I don't have any magic reviewing powers!

If this gets in I'll be happy to put up a PR to enable the community cloud. And your help (and anyone who knows what they're doing) would be appreciated.

pirtleshell · 2025-06-09T22:22:41Z

sky/catalog/data_fetchers/fetch_runpod.py

+        # only add secure clouds. skipping community cloud instances.
+        if not detailed_gpu['secureCloud']:
+            continue


i attempted to limit the changeset in this PR. to the best of my knowledge, this code could be easily updated to accommodate the request in #3441 for community instance types.

it requires checking detailed_gpu['communityCloud'] bool and setting InstanceType below to have a _COMMUNITY suffix where _SECURE is currently hardcoded.

andylizf

Left some nits here.

andylizf · 2025-07-25T07:46:52Z

sky/catalog/data_fetchers/fetch_runpod.py

+    return round(price, 2)
+
+
+def get_gpu_counts(max_count: int) -> List[int]:


is this function really used?

andylizf · 2025-07-25T07:58:27Z

sky/catalog/data_fetchers/fetch_runpod.py

+    if 'errors' in result:
+        raise ValueError(f'GraphQL errors: {result["errors"]}')
+
+    return result['data']['gpuTypes'][0]


should have some assertions here?

andylizf · 2025-07-25T07:58:48Z

sky/catalog/data_fetchers/fetch_runpod.py

+    return sorted(counts)
+
+
+def format_gpu_name(gpu_type: Dict) -> str:


Suggested change

def format_gpu_name(gpu_type: Dict) -> str:

def format_gpu_name(gpu_type: Dict[str, Any]) -> str:

andylizf · 2025-07-25T18:00:30Z

sky/catalog/data_fetchers/fetch_runpod.py

+
+    # Fall back to defaults if values are None
+    # scale default value by gpu_count
+    vcpus = DEFAULT_VCPUS * gpu_count if vcpus is None else vcpus


lets have more assertions for this vcpus here? like is a positive integer

andylizf · 2025-07-25T18:04:08Z

sky/catalog/data_fetchers/fetch_runpod.py

+        # Generate instances from GPU types
+        instances = []
+        for gpu in gpus:
+            # initial gpu details. later, request specific quantity details


is it necessary? seems its quite redundant

kevinmingtarja · 2025-07-31T01:12:15Z

Hi @pirtleshell, thanks for writing this PR! FYI We're also planning on adding CPU instances to the Runpod catalog: skypilot-org/skypilot-catalog#140

I generated the CPU instances list by modifying the script from this PR. Would you mind adding it to this PR as well? Or we could do it in a follow-up PR too. I can share the diffs I made to your script to support fetching CPU instances.

In any case, we should do that before merging skypilot-org/skypilot-catalog#133, as to not override the CPU instances we will be adding.

kevinmingtarja · 2025-07-31T06:35:42Z

sky/catalog/data_fetchers/fetch_runpod.py

+    'Price',
+    'SpotPrice',


I think the position of Price and SpotPrice is reversed here, looking at https://github.com/skypilot-org/skypilot-catalog/blob/master/catalogs/v7/runpod/vms.csv

Suggested change

'Price',

'SpotPrice',

'SpotPrice',

'Price',

adocherty · 2025-08-08T07:21:13Z

Hi @pirtleshell & @kevinmingtarja
I'd really like to get this functionality into the codebase and I'm happy to address the reviews on this PR and work with you to add CPU functionality.

If you're happy for me to help, can I get write access to this branch?

Run yapf and pylint

6bdb99f

pirtleshell mentioned this pull request Jun 9, 2025

feat(runpod): Add RunPod catalog update cron skypilot-org/skypilot-catalog#133

Draft

adocherty reviewed Jun 11, 2025

View reviewed changes

adocherty mentioned this pull request Jun 11, 2025

Add Runpod Data Fetcher for Community and Secure Pods #5929

Open

pirtleshell commented Jun 16, 2025

View reviewed changes

pirtleshell mentioned this pull request Jun 17, 2025

feat: add L40S to available RunPod GPUs skypilot-org/skypilot-catalog#131

Merged

Michaelvll requested a review from andylizf July 25, 2025 06:40

andylizf requested changes Jul 25, 2025

View reviewed changes

kevinmingtarja mentioned this pull request Jul 30, 2025

[Runpod] Add CPU instances skypilot-org/skypilot-catalog#140

Merged

2 tasks

kevinmingtarja reviewed Jul 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(catalog): Add RunPod data fetcher #5930

feat(catalog): Add RunPod data fetcher #5930

Uh oh!

pirtleshell commented Jun 9, 2025 •

edited

Loading

Uh oh!

pirtleshell commented Jun 9, 2025

Uh oh!

adocherty Jun 11, 2025

Uh oh!

adocherty Jun 11, 2025

Uh oh!

pirtleshell Jun 16, 2025

Uh oh!

adocherty Jun 17, 2025

Uh oh!

pirtleshell Jun 9, 2025 •

edited

Loading

Uh oh!

andylizf left a comment

Uh oh!

andylizf Jul 25, 2025

Uh oh!

andylizf Jul 25, 2025

Uh oh!

andylizf Jul 25, 2025

Uh oh!

andylizf Jul 25, 2025

Uh oh!

andylizf Jul 25, 2025

Uh oh!

kevinmingtarja commented Jul 31, 2025

Uh oh!

kevinmingtarja Jul 31, 2025

Uh oh!

adocherty commented Aug 8, 2025

Uh oh!

Uh oh!

		return round(price, 2)


		def get_gpu_counts(max_count: int) -> List[int]:

		return sorted(counts)


		def format_gpu_name(gpu_type: Dict) -> str:

	def format_gpu_name(gpu_type: Dict) -> str:
	def format_gpu_name(gpu_type: Dict[str, Any]) -> str:

feat(catalog): Add RunPod data fetcher #5930

Are you sure you want to change the base?

feat(catalog): Add RunPod data fetcher #5930

Uh oh!

Conversation

pirtleshell commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

pirtleshell commented Jun 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pirtleshell Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andylizf left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinmingtarja commented Jul 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adocherty commented Aug 8, 2025

Uh oh!

Uh oh!

pirtleshell commented Jun 9, 2025 •

edited

Loading

pirtleshell Jun 9, 2025 •

edited

Loading