feat: use cyclopts to enforce CLI/YAML parity by viraatc · Pull Request #193 · mlcommons/endpoints

viraatc · 2026-03-20T04:46:55Z

What does this PR do?

Replace argparse CLI with cyclopts. config/schema.py Pydantic models are the single source of truth — CLI flags are auto-generated from schema fields. Add a shorthand by adding one cyclopts.Parameter(alias=...) annotation, zero CLI code changes.

Architecture

Discriminated union: OfflineBenchmarkConfig/OnlineBenchmarkConfig subclasses in config/schema.py. Both CLI and YAML auto-select the right subclass via Pydantic TypeAdapter with Discriminator on type field.
Dataset string format: --dataset [perf|acc:]<path>[,key=value...] — TOML-style dotted paths for samples, format, column remaps, and accuracy config.
Sub-model validation: RuntimeConfig validates durations, LoadPattern validates completeness (poisson needs qps, concurrency needs target). BenchmarkConfig._resolve_and_validate only handles cross-model checks.
with_updates(): Unified config reconstruction helper that re-runs all validators.

Usage

# Offline benchmark (max throughput)
inference-endpoint benchmark offline \\
  --endpoints http://localhost:8000 \\
  --model Qwen/Qwen3-8B \\
  --dataset tests/datasets/dummy_1k.pkl

# Online benchmark (sustained QPS)
inference-endpoint benchmark online \\
  --endpoints http://localhost:8000 \\
  --model Qwen/Qwen3-8B \\
  --dataset tests/datasets/dummy_1k.pkl \\
  --load-pattern poisson --target-qps 100

# Dataset with column remapping and samples limit
inference-endpoint benchmark offline \\
  --endpoints http://localhost:8000 \\
  --model Qwen/Qwen3-8B \\
  --dataset data.csv,samples=500,parser.prompt=article

# Accuracy dataset with eval config (full CLI/YAML parity)
inference-endpoint benchmark offline \\
  --endpoints http://localhost:8000 \\
  --model M \\
  --dataset perf:perf.pkl \\
  --dataset acc:eval.pkl,accuracy_config.eval_method=pass_at_1,accuracy_config.ground_truth=answer \\
  --mode both

# From YAML config (auto-selects subclass from type field)
inference-endpoint benchmark from-config -c config.yaml

# Debug mode — dumps full BenchmarkConfig JSON
inference-endpoint -vv benchmark offline --endpoints x --model M --dataset D

Changes

Config Architecture

Discriminated union — from_yaml_file() uses Pydantic TypeAdapter to auto-select OfflineBenchmarkConfig/OnlineBenchmarkConfig from type field
CLI wrapper classes deleted from cli.py — subclasses now live in schema.py
OnlineSettings is a real class (was an alias)
show=False on submission_ref/benchmark_mode moved to base class
create_default_config delegates to subclasses (no hardcoded workers=4)
with_updates() helper replaces manual model_validate(dump|dict) pattern

Dataset String Format

Grammar: [perf|acc:]<path>[,key=value...] (TOML-style dotted paths)
Dotted paths build nested dicts: parser.prompt=article, accuracy_config.eval_method=pass_at_1
Pydantic validates all fields — extra="forbid" on Dataset/AccuracyConfig catches typos
Parser remap targets validated against MakeAdapterCompatible (no manual list)
BeforeValidator on datasets field coerces strings — no post-construction rebuild
Parser dict direction is {target: source} — flipped in parse_dataset_string for ColumnRemap

Validation

RuntimeConfig self-validates max_duration_ms >= min_duration_ms
LoadPattern self-validates completeness (poisson needs qps, concurrency needs target)
ColumnRemap raises KeyError on missing source columns (was errors="ignore")
MakeAdapterCompatible uses tuple key for system_prompt (optional, not strict)
DatasetValidationError for clean --dataset error messages
-vv flag dumps full BenchmarkConfig as JSON before execution

File Structure

config/schema_utils.py — extracted env var interpolation + dataset string parsing
commands/benchmark/cli.py — simplified (no wrapper classes, no _parse_dataset_args)
docs/CLI_DESIGN.md — fully rewritten for unified config, dataset format, validation layers
docs/CLI_QUICK_REFERENCE.md — updated dataset format, accuracy CLI examples
AGENTS.md — updated class hierarchy and construction flow

Tests

Consolidated overlapping tests into parametrized cases (fewer methods, more cases)
Dataset string coercion tests cover type prefix, samples, format, column remap, accuracy config
ColumnRemap test updated for strict source column validation
536 unit tests passing

Type of change

New feature
Bug fix (ColumnRemap silent ignore)
Refactor/cleanup
Documentation update

Testing

536 unit tests passing
Pre-commit hooks pass (ruff, mypy, prettier, template validation)
Manual CLI verification (offline, online, from-config, -vv, error cases)

github-actions · 2026-03-20T04:47:04Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-03-20T04:47:50Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a fundamental architectural shift in how the CLI and configuration are handled, migrating from a custom argparse-based system to pydantic-settings. This change significantly improves the consistency, maintainability, and extensibility of the tool's configuration and command-line interface. It streamlines the process of defining and validating parameters, ensuring that both direct CLI arguments and YAML files adhere to the same schema and benefit from automatic validation and environment variable resolution.

Highlights

CLI Architecture Overhaul: The command-line interface (CLI) has been completely refactored to leverage pydantic-settings and CliApp. This change ensures full parity between CLI flags and YAML configuration files, making config/schema.py the single source of truth for both.
Enhanced Configuration Management: Introduced robust environment variable interpolation (e.g., ${VAR} and ${VAR:-default}) directly within YAML configuration files. The BenchmarkConfig model now includes a model_validator for comprehensive default resolution and validation, eliminating the need for separate validation functions.
Simplified CLI Usage: CLI commands now support flat aliases for frequently used fields (e.g., -e for --endpoints, -m for --model, -d for --dataset). Command names and flag formats have been standardized to use underscores (_) instead of hyphens (-) for consistency with Python naming conventions.
Codebase Refactoring and Cleanup: Significant portions of the old argparse-based CLI (src/inference_endpoint/cli.py, src/inference_endpoint/commands/benchmark.py, src/inference_endpoint/commands/eval.py, src/inference_endpoint/commands/utils.py, src/inference_endpoint/config/yaml_loader.py) have been removed and replaced with a more modular and maintainable structure using pydantic-settings models and dedicated command execution modules.
Asynchronous Utility: A new asynchronous runner utility (src/inference_endpoint/async_utils/runner.py) has been added, utilizing uvloop and eager_task_factory to efficiently execute async logic from synchronous CLI command handlers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an excellent and substantial refactoring of the CLI and configuration system. Migrating from argparse to pydantic-settings provides a much more robust, maintainable, and user-friendly foundation. Key improvements include a single source of truth for configuration via Pydantic models, self-validating configuration models, support for environment variable interpolation in YAML files, and greatly improved error reporting. The overall changes significantly enhance the quality and architecture of the project. I have a couple of minor suggestions for improvement.

src/inference_endpoint/config/cli.py

src/inference_endpoint/commands/benchmark/execute.py

Copilot

Pull request overview

This PR migrates the project’s CLI and configuration handling to pydantic-settings so the same Pydantic schema drives YAML configs, CLI flags, and validation, improving CLI/YAML parity while removing the legacy argparse-based implementation.

Changes:

Replace the legacy argparse CLI with a pydantic-settings CliApp tree and flat subcommand models that build a nested BenchmarkConfig.
Move YAML loading/validation responsibilities into BenchmarkConfig (including ${VAR} interpolation) and add a ConfigResolver for YAML + override merging.
Refactor commands and tests to use new typed configs and new benchmark execution module layout.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/unit/config/test_yaml_loader.py	Updates YAML load tests to `BenchmarkConfig.from_yaml_file()`, adds env-var interpolation + roundtrip serialization coverage.
tests/unit/config/test_resolver.py	Adds unit tests for `ConfigResolver` merging, CLI defaults, streaming resolution, and model-name fallback.
tests/unit/config/test_cli_app.py	Adds unit tests for `pydantic-settings` CLI parsing across subcommands.
tests/unit/commands/test_utils.py	Ports utils command tests to `execute_info/init/validate` functions and sync test style.
tests/unit/commands/test_eval.py	Ports eval tests to new `EvalConfig` CLI model and stub behavior.
tests/unit/commands/test_benchmark.py	Reworks benchmark tests around flat CLI models and schema-level validation.
tests/integration/commands/test_probe_command.py	Ports probe integration tests to `ProbeConfig` + `execute_probe`.
tests/integration/commands/test_benchmark_command.py	Ports benchmark integration tests to `run_benchmark(config, mode)` with minimal configs.
src/inference_endpoint/main.py	Switches entrypoint to `CliApp.run()` with structured validation error reporting.
src/inference_endpoint/config/yaml_loader.py	Removes legacy YAML loader wrapper and custom `ConfigError`.
src/inference_endpoint/config/schema.py	Makes schema the source of truth; adds env-var interpolation, model validators, YAML IO, and consolidated validation logic.
src/inference_endpoint/config/runtime_settings.py	Updates docstring reference from removed `yaml_loader`.
src/inference_endpoint/config/resolver.py	Adds `ConfigResolver` for YAML load + deep merge of explicit overrides.
src/inference_endpoint/config/cli.py	Adds new `CliApp` model tree and flat benchmark/utility subcommand models.
src/inference_endpoint/commands/validate.py	Adds typed validate command wrapper using `BenchmarkConfig.from_yaml_file()`.
src/inference_endpoint/commands/utils.py	Removes legacy combined “utils” command module.
src/inference_endpoint/commands/probe.py	Refactors probe into sync `execute_probe()` wrapper + async implementation executed via `run_async`.
src/inference_endpoint/commands/init.py	Adds typed init command wrapper for generating/copying config templates.
src/inference_endpoint/commands/info.py	Adds typed info command wrapper.
src/inference_endpoint/commands/eval.py	Removes legacy eval implementation (now a stub CLI config raising NotImplementedError).
src/inference_endpoint/commands/benchmark/execute.py	Introduces new benchmark execution module with setup/run/finalize phases.
src/inference_endpoint/commands/benchmark/init.py	Exports `run_benchmark` from the new benchmark package.
src/inference_endpoint/commands/benchmark.py	Removes legacy monolithic benchmark command implementation.
src/inference_endpoint/commands/init.py	Simplifies commands package exports after refactor.
src/inference_endpoint/cli.py	Removes legacy argparse CLI entrypoint.
src/inference_endpoint/async_utils/runner.py	Adds shared `run_async()` helper (uvloop + eager task factory) for sync command wrappers.
pyproject.toml	Adds `pydantic-settings>=2.7` dependency for new CLI framework.
docs/CLI_QUICK_REFERENCE.md	Updates CLI docs to the new pydantic-settings-based interface and features.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/CLI_QUICK_REFERENCE.md

Copilot

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/CLI_QUICK_REFERENCE.md

tests/unit/endpoint_client/test_utils.py

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/CLI_QUICK_REFERENCE.md

src/inference_endpoint/commands/benchmark/execute.py

src/inference_endpoint/commands/benchmark/cli.py

- Add @pytest.mark.unit markers to new test_utils.py - Fix resolver.py docstring: OfflineBenchmark/OnlineBenchmark -> OfflineConfig/OnlineConfig - Fix runner.py docstring: Typer -> synchronous CLI - Fix AGENTS.md CLI path: config/cli.py -> cli.py, commands/benchmark/cli.py - Fix schema.py standalone triple-quoted string -> comment - Add yaml.YAMLError to validate.py except clause - Replace obscure SIGINT lambda with plain function in execute.py - Fix CLI_QUICK_REFERENCE.md uppercase enum values - Add SampleEventHandler.clear_hooks before register in execute.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/CLI_QUICK_REFERENCE.md

Copilot

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/config/schema.py

src/inference_endpoint/config/resolver.py

src/inference_endpoint/commands/benchmark/execute.py

nvzhihanj

.

src/inference_endpoint/commands/benchmark/execute.py

nvzhihanj

.

src/inference_endpoint/commands/benchmark/cli.py

nvzhihanj

.

src/inference_endpoint/main.py

nvzhihanj

.

src/inference_endpoint/cli.py

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/commands/benchmark/execute.py

Copilot

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/config/schema.py

src/inference_endpoint/commands/benchmark/execute.py

AGENTS.md

Copilot

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/CLI_QUICK_REFERENCE.md

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/CLI_QUICK_REFERENCE.md

src/inference_endpoint/commands/benchmark/execute.py

Copilot

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/dataset_manager/transforms.py

src/inference_endpoint/config/schema_utils.py

src/inference_endpoint/config/utils.py

Copilot

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/config/schema.py

docs/CLI_QUICK_REFERENCE.md

tests/unit/dataset_manager/test_transforms.py

src/inference_endpoint/config/utils.py

docs/CLI_QUICK_REFERENCE.md

docs/CLI_DESIGN.md

src/inference_endpoint/config/schema.py

Copilot

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/inference_endpoint/config/schema_utils.py:138

parse_dataset_string() flips parser from {target: source} to {source: target} via a dict comprehension, which will silently drop entries if the user maps multiple targets to the same source column (e.g. parser.prompt=text,parser.system=text). It would be better to detect duplicate source values before flipping and raise a clear ValueError so user intent isn't lost.

    # Validate parser remap targets (CLI only — YAML validated in factory)
    if "parser" in result and isinstance(result["parser"], dict):
        # Lazy import to avoid circular dep: schema_utils → dataset_manager → schema
        from inference_endpoint.dataset_manager.transforms import (
            MakeAdapterCompatible,
        )

        valid = set(MakeAdapterCompatible().remap.values())
        invalid = set(result["parser"].keys()) - valid
        if invalid:
            raise ValueError(
                f"Unknown parser remap target(s): {invalid}. "
                f"Valid targets: {sorted(valid)}"
            )

    return result

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/dataset_manager/factory.py

src/inference_endpoint/commands/validate.py

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/inference_endpoint/dataset_manager/factory.py

src/inference_endpoint/dataset_manager/transforms.py

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…SIGN.md

Copilot

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

viraatc requested a review from a team as a code owner March 20, 2026 04:46

Copilot AI review requested due to automatic review settings March 20, 2026 04:46

github-actions bot requested review from arekay-nv and nvzhihanj March 20, 2026 04:47

Copilot started reviewing on behalf of viraatc March 20, 2026 04:47 View session

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

src/inference_endpoint/config/cli.py Outdated Show resolved Hide resolved

src/inference_endpoint/commands/benchmark/execute.py Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 20, 2026

View reviewed changes

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

viraatc changed the title ~~feat: use pydantic-settings to enforce CLI/YAML parity~~ feat: use cyclopts to enforce CLI/YAML parity Mar 20, 2026

Copilot AI review requested due to automatic review settings March 20, 2026 06:28

Copilot started reviewing on behalf of viraatc March 20, 2026 06:29 View session

viraatc requested a review from nv-alicheng March 20, 2026 06:31

Copilot AI reviewed Mar 20, 2026

View reviewed changes

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

docs/CLI_QUICK_REFERENCE.md Show resolved Hide resolved

docs/CLI_QUICK_REFERENCE.md Show resolved Hide resolved

github-code-quality bot found potential problems Mar 20, 2026

View reviewed changes

tests/unit/endpoint_client/test_utils.py Fixed Show fixed Hide fixed

tests/unit/endpoint_client/test_utils.py Fixed Show fixed Hide fixed

Copilot AI review requested due to automatic review settings March 20, 2026 23:33

Copilot started reviewing on behalf of viraatc March 20, 2026 23:34 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

src/inference_endpoint/commands/benchmark/execute.py Show resolved Hide resolved

src/inference_endpoint/commands/benchmark/cli.py Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 20, 2026 23:54

Copilot started reviewing on behalf of viraatc March 20, 2026 23:55 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

docs/CLI_QUICK_REFERENCE.md Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 21, 2026 00:02

Copilot started reviewing on behalf of viraatc March 21, 2026 00:03 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

nvzhihanj reviewed Mar 21, 2026

View reviewed changes

src/inference_endpoint/commands/benchmark/execute.py Show resolved Hide resolved

This comment was marked as spam.

Sign in to view

nvzhihanj reviewed Mar 21, 2026

View reviewed changes

src/inference_endpoint/commands/benchmark/cli.py Show resolved Hide resolved

nvzhihanj reviewed Mar 21, 2026

View reviewed changes

src/inference_endpoint/main.py Show resolved Hide resolved

nvzhihanj reviewed Mar 21, 2026

View reviewed changes

src/inference_endpoint/cli.py Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of viraatc March 23, 2026 22:57 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

src/inference_endpoint/commands/benchmark/execute.py Show resolved Hide resolved

mlcommons deleted a comment from Copilot AI Mar 23, 2026

Copilot AI reviewed Mar 23, 2026

View reviewed changes

src/inference_endpoint/config/schema.py Outdated Show resolved Hide resolved

src/inference_endpoint/commands/benchmark/execute.py Show resolved Hide resolved

AGENTS.md Outdated Show resolved Hide resolved