Skip to content

PoC: dynamic SUTs #1013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

PoC: dynamic SUTs #1013

wants to merge 28 commits into from

Conversation

rogthefrog
Copy link
Contributor

@rogthefrog rogthefrog commented May 1, 2025

As an operator, I want to be able to pass a sensible name for a SUT that isn't registered to modelgauge or modelbench
Modelgauge and modelbench should try to create a new SUT from that name and run the requested job with it
So the code doesn't need to be updated whenever we want to add a new SUT that behaves like known SUTs.

Story

Important

This only works with huggingface proxied inference endpoints for now. The dynamic SUT names must be

hf/<provider>/<vendor>/<model>

e.g.

hf/nebius/google/gemma-3-27b-it

How To Test

deepseek-ai/DeepSeek-V3 is available on several providers (replicate, together, etc).

You can test by requesting any one of those:

poetry run modelbench benchmark -m 1 -s hf/novita/deepseek-ai/DeepSeek-V3
poetry run modelbench benchmark -m 1 -s hf/together/deepseek-ai/DeepSeek-V3

Additional Information

SUT IDs And Names Behave The Same Way

You can pass a dynamic SUT name the same way you can pass a known SUT ID.

poetry run modelbench benchmark -m 1 -s hf/nebius/google/gemma-3-27b-it -s llama-3-70b-chat
Starting run for ['general_purpose_ai_chat_benchmark-1.0-en_us-demo-default'] over ['llama-3-70b-chat', 'google-gemma-3-27b-it-hf-nebius']

If you pass in a dynamic SUT name that translates to a known SUT, that's fine, it'll work.

poetry run modelbench benchmark -m 1 -n hf/nebius/google/gemma-3-27b-it # this is a known SUT
Starting run for ['general_purpose_ai_chat_benchmark-1.0-en_us-demo-default'] over ['google-gemma-3-27b-it-hf-nebius']

How Do I Find Models With Serverless Providers?

import huggingface_hub as hfh
name = "deepseek-ai/DeepSeek-V3"
inference_providers = hfh.model_info(name, expand="inferenceProviderMapping")
print(inference_providers.inference_provider_mapping)

Alternate Providers

If you change find_alternative to True in huggingface_sut_maker. HuggingFaceChatCompletionServerlessSUTMaker.find then the program will automatically look for the same model on a different serverless provider, if the provider you requested isn't available.

Dedicated Endpoints

Those are TBD. The code only turns one on if it already exists and is asleep. It will not create a new one. That's easy to add.

@rogthefrog rogthefrog requested a review from a team as a code owner May 1, 2025 01:15
@rogthefrog rogthefrog had a problem deploying to Scheduled Testing May 1, 2025 01:15 — with GitHub Actions Failure
@rogthefrog rogthefrog had a problem deploying to Scheduled Testing May 1, 2025 01:15 — with GitHub Actions Failure
@rogthefrog rogthefrog had a problem deploying to Scheduled Testing May 1, 2025 01:15 — with GitHub Actions Failure
Copy link

github-actions bot commented May 1, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 2, 2025 01:07 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 2, 2025 01:07 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 2, 2025 01:07 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:20 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:20 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:20 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:47 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:47 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:47 — with GitHub Actions Inactive
@rogthefrog rogthefrog force-pushed the feat/948/dynamic-sut branch from 6b6352e to f7d66b0 Compare May 6, 2025 19:49
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:49 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:49 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 6, 2025 19:49 — with GitHub Actions Inactive
@rogthefrog
Copy link
Contributor Author

This looks great! One thing I'm a little concerned about is the conflation between SUT uids and names. I think we should either a) go back to when you had a separate argument to pass in dynamic SUT names or b) request dynamic suts at the cli using a structured UID (with hyphens) instead of a name.

I also don't like that typos in SUT uids will be identified as dynamic SUT names and then result in an inaccurate error message (i.e. "couldnt make this dynamic sut" instead of "invalid SUT UID, please use one of the following..")

One reason I like using the same argument for known SUT IDs and named dynamic SUTs is that the user doesn't need to know if a SUT already exists. If you pass in a dynamic SUT name that can be parsed into an existing SUT id, then the existing SUT will be used.

If we constrain what a dynamic SUT name can look like, e.g. a SUT is dynamic if and only if the identifier passed to modelbench includes a slash, would that solve the typo issue? We could make the code try to create a dynamic SUT only if the string includes a slash, and SUT IDs without slashes but with a typo would then be rejected with the right error.

@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 00:00 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 00:00 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 00:00 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 00:05 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 00:05 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 00:05 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 01:21 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 01:21 — with GitHub Actions Inactive
@rogthefrog rogthefrog temporarily deployed to Scheduled Testing May 7, 2025 01:21 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants