Skip to content

Conversation

hanouticelina
Copy link
Contributor

This PR implements a CLI to manage Inference Endpoints, this provides "one liners" to deploy/delete/update/etc. endpoints, which could be handy in many cases. The DX intentionally mirrors a bit the UI instead of the API, to quote @ErikKaum :

we're renaming things in the UI quite fast to adapt and make things make more sense. And in many cases in the UI things are configured with slightly different names/groupings that in the API. Just because it's faster than in the API.

I explored a few layouts (e.g. a single deploy command with --catalog), but the cleanest UX ended up being two explicit paths:

  • hf inference-endpoints deploy hub ... – minimal set of hardware/task configs for Hub models.
  • hf inference-endpoints deploy catalog ... – one liner using optimized configs from the model catalog.
  • delete and update endpoints currently live under the "Settings" group in the UI, but feels more natural to keep them top-level in the CLI 🤷‍♀️
> hf inference-endpoints --help
Usage: hf inference-endpoints [OPTIONS] COMMAND [ARGS]...

  Manage Hugging Face Inference Endpoints.

Options:
  --help  Show this message and exit.

Commands:
  delete         Delete an Inference Endpoint permanently.
  deploy         Deploy Inference Endpoints from the Hub or the Catalog.
  describe       Get information about an Inference Endpoint.
  list           Lists all inference endpoints for the given namespace.
  list-catalog   List available Catalog models.
  pause          Pause an Inference Endpoint.
  resume         Resume an Inference Endpoint.
  scale-to-zero  Scale an Inference Endpoint to zero.
  update         Update an existing endpoint.

happy to more iterate if there are more suggestions to make the DX better (and simpler?)



@deploy_app.command(name="hub", help="Deploy an Inference Endpoint from a Hub repository.")
def deploy_from_hub(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

splitted the deploy into two subcommands instead of using a flag to deploy from the Model Catalog because Typer doesn't easily allow conditional requirements (i.e., "these parameters are required unless that flag (e.g. --from-catalog) is set"), which makes validation and type hints messy so using subcommands lets Typer enforce required options cleanly for each case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of having

# deploy from Hub
hf endpoints deploy

# list catalog
hf endpoints catalog ls

# deploy from catalog
hf endpoints catalog deploy

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other if keeping the same I'd be more inclined for

hf endpoints deploy from-repo
hf endpoints deploy from-catalog

(since both are deployed from Hub + the from- makes it more explicit)

(but still slight preference for the one above)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Left some high level feedback mostly geared towards the CLI syntax. Let me know what you think :)

NamespaceOpt = Annotated[
Optional[str],
typer.Option(
help="The namespace where the Inference Endpoint will be created. Defaults to the current user's namespace.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="The namespace where the Inference Endpoint will be created. Defaults to the current user's namespace.",
help="The namespace associated with the Inference Endpoint. Defaults to the current user's namespace.",

less specific too endpoint creation

Comment on lines +33 to +38
@app.command(help="Lists all Inference Endpoints for the given namespace.")
def list(
namespace: NamespaceOpt = None,
token: TokenOpt = None,
) -> None:
api = get_hf_api(token=token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@app.command(help="Lists all Inference Endpoints for the given namespace.")
def list(
namespace: NamespaceOpt = None,
token: TokenOpt = None,
) -> None:
api = get_hf_api(token=token)
@app.command()
def list(
namespace: NamespaceOpt = None,
token: TokenOpt = None,
) -> None:
"""Lists all Inference Endpoints for the given namespace."""
api = get_hf_api(token=token)

(nit) slight preference for docstring help. It makes it easier if we want to extend it to multiline in the future

(same for other commands)

Copy link
Contributor

@Wauplin Wauplin Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially rename to ls ?
EDIT: naah, it's good like this. We are not listing a filesystem

)


deploy_app = typer_factory(help="Deploy Inference Endpoints from the Hub or the Catalog.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deploy_app = typer_factory(help="Deploy Inference Endpoints from the Hub or the Catalog.")
deploy_app = typer_factory(help="Deploy a new Inference Endpoint.")



@deploy_app.command(name="hub", help="Deploy an Inference Endpoint from a Hub repository.")
def deploy_from_hub(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of having

# deploy from Hub
hf endpoints deploy

# list catalog
hf endpoints catalog ls

# deploy from catalog
hf endpoints catalog deploy

?



@deploy_app.command(name="hub", help="Deploy an Inference Endpoint from a Hub repository.")
def deploy_from_hub(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other if keeping the same I'd be more inclined for

hf endpoints deploy from-repo
hf endpoints deploy from-catalog

(since both are deployed from Hub + the from- makes it more explicit)

(but still slight preference for the one above)

Comment on lines +275 to +277
typer.Option(
help="Skip confirmation prompts.",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typer.Option(
help="Skip confirmation prompts.",
),
typer.Option("--yes", help="Skip confirmation prompts."),

explicit --yes avoids to have a --no-yes defined https://typer.tiangolo.com/tutorial/parameter-types/bool/



@app.command(help="List available Catalog models.")
def list_catalog(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight preference for

hf endpoints catalog list

app.add_typer(repo_cli, name="repo")
app.add_typer(repo_files_cli, name="repo-files")
app.add_typer(jobs_cli, name="jobs")
app.add_typer(inference_endpoints_cli, name="inference-endpoints")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
app.add_typer(inference_endpoints_cli, name="inference-endpoints")
app.add_typer(inference_endpoints_cli, name="endpoints")

what do you think of renaming everything to hf endpoints? It is slightly less explicitly but much simpler to type and copy-paste IMO

Comment on lines +317 to +323
running_ok: Annotated[
bool,
typer.Option(
help="If `True`, the method will not raise an error if the Inference Endpoint is already running."
),
] = True,
token: TokenOpt = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one feels weird in the CLI --help (--running-ok / --no-running-ok). I would either remove it or rename it to --fail-if-already-running (with default to False)

(in any case I genuinely don't see a case someone would want to fail^^)


@app.command(help="Update an existing endpoint.")
def update(
endpoint_name: NameArg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endpoint_name: NameArg,
name: NameArg,
namespace: NamespaceOpt = None,

For consistency as all other commands (pause, resume, etc.) have both name and namespace defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants