Skip to content

feat: use cyclopts to enforce CLI/YAML parity#193

Merged
viraatc merged 5 commits intomainfrom
feat/viraatc-cli
Mar 24, 2026
Merged

feat: use cyclopts to enforce CLI/YAML parity#193
viraatc merged 5 commits intomainfrom
feat/viraatc-cli

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Mar 20, 2026

What does this PR do?

Replace argparse CLI with cyclopts. config/schema.py Pydantic models are the single source of truth — CLI flags are auto-generated from schema fields. Add a shorthand by adding one cyclopts.Parameter(alias=...) annotation, zero CLI code changes.

Architecture

  • Discriminated union: OfflineBenchmarkConfig/OnlineBenchmarkConfig subclasses in config/schema.py. Both CLI and YAML auto-select the right subclass via Pydantic TypeAdapter with Discriminator on type field.
  • Dataset string format: --dataset [perf|acc:]<path>[,key=value...] — TOML-style dotted paths for samples, format, column remaps, and accuracy config.
  • Sub-model validation: RuntimeConfig validates durations, LoadPattern validates completeness (poisson needs qps, concurrency needs target). BenchmarkConfig._resolve_and_validate only handles cross-model checks.
  • with_updates(): Unified config reconstruction helper that re-runs all validators.

Usage

# Offline benchmark (max throughput)
inference-endpoint benchmark offline \\
  --endpoints http://localhost:8000 \\
  --model Qwen/Qwen3-8B \\
  --dataset tests/datasets/dummy_1k.pkl

# Online benchmark (sustained QPS)
inference-endpoint benchmark online \\
  --endpoints http://localhost:8000 \\
  --model Qwen/Qwen3-8B \\
  --dataset tests/datasets/dummy_1k.pkl \\
  --load-pattern poisson --target-qps 100

# Dataset with column remapping and samples limit
inference-endpoint benchmark offline \\
  --endpoints http://localhost:8000 \\
  --model Qwen/Qwen3-8B \\
  --dataset data.csv,samples=500,parser.prompt=article

# Accuracy dataset with eval config (full CLI/YAML parity)
inference-endpoint benchmark offline \\
  --endpoints http://localhost:8000 \\
  --model M \\
  --dataset perf:perf.pkl \\
  --dataset acc:eval.pkl,accuracy_config.eval_method=pass_at_1,accuracy_config.ground_truth=answer \\
  --mode both

# From YAML config (auto-selects subclass from type field)
inference-endpoint benchmark from-config -c config.yaml

# Debug mode — dumps full BenchmarkConfig JSON
inference-endpoint -vv benchmark offline --endpoints x --model M --dataset D

Changes

Config Architecture

  • Discriminated union — from_yaml_file() uses Pydantic TypeAdapter to auto-select OfflineBenchmarkConfig/OnlineBenchmarkConfig from type field
  • CLI wrapper classes deleted from cli.py — subclasses now live in schema.py
  • OnlineSettings is a real class (was an alias)
  • show=False on submission_ref/benchmark_mode moved to base class
  • create_default_config delegates to subclasses (no hardcoded workers=4)
  • with_updates() helper replaces manual model_validate(dump|dict) pattern

Dataset String Format

  • Grammar: [perf|acc:]<path>[,key=value...] (TOML-style dotted paths)
  • Dotted paths build nested dicts: parser.prompt=article, accuracy_config.eval_method=pass_at_1
  • Pydantic validates all fields — extra="forbid" on Dataset/AccuracyConfig catches typos
  • Parser remap targets validated against MakeAdapterCompatible (no manual list)
  • BeforeValidator on datasets field coerces strings — no post-construction rebuild
  • Parser dict direction is {target: source} — flipped in parse_dataset_string for ColumnRemap

Validation

  • RuntimeConfig self-validates max_duration_ms >= min_duration_ms
  • LoadPattern self-validates completeness (poisson needs qps, concurrency needs target)
  • ColumnRemap raises KeyError on missing source columns (was errors="ignore")
  • MakeAdapterCompatible uses tuple key for system_prompt (optional, not strict)
  • DatasetValidationError for clean --dataset error messages
  • -vv flag dumps full BenchmarkConfig as JSON before execution

File Structure

  • config/schema_utils.py — extracted env var interpolation + dataset string parsing
  • commands/benchmark/cli.py — simplified (no wrapper classes, no _parse_dataset_args)
  • docs/CLI_DESIGN.md — fully rewritten for unified config, dataset format, validation layers
  • docs/CLI_QUICK_REFERENCE.md — updated dataset format, accuracy CLI examples
  • AGENTS.md — updated class hierarchy and construction flow

Tests

  • Consolidated overlapping tests into parametrized cases (fewer methods, more cases)
  • Dataset string coercion tests cover type prefix, samples, format, column remap, accuracy config
  • ColumnRemap test updated for strict source column validation
  • 536 unit tests passing

Type of change

  • New feature
  • Bug fix (ColumnRemap silent ignore)
  • Refactor/cleanup
  • Documentation update

Testing

  • 536 unit tests passing
  • Pre-commit hooks pass (ruff, mypy, prettier, template validation)
  • Manual CLI verification (offline, online, from-config, -vv, error cases)

@viraatc viraatc requested a review from a team as a code owner March 20, 2026 04:46
Copilot AI review requested due to automatic review settings March 20, 2026 04:46
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a fundamental architectural shift in how the CLI and configuration are handled, migrating from a custom argparse-based system to pydantic-settings. This change significantly improves the consistency, maintainability, and extensibility of the tool's configuration and command-line interface. It streamlines the process of defining and validating parameters, ensuring that both direct CLI arguments and YAML files adhere to the same schema and benefit from automatic validation and environment variable resolution.

Highlights

  • CLI Architecture Overhaul: The command-line interface (CLI) has been completely refactored to leverage pydantic-settings and CliApp. This change ensures full parity between CLI flags and YAML configuration files, making config/schema.py the single source of truth for both.
  • Enhanced Configuration Management: Introduced robust environment variable interpolation (e.g., ${VAR} and ${VAR:-default}) directly within YAML configuration files. The BenchmarkConfig model now includes a model_validator for comprehensive default resolution and validation, eliminating the need for separate validation functions.
  • Simplified CLI Usage: CLI commands now support flat aliases for frequently used fields (e.g., -e for --endpoints, -m for --model, -d for --dataset). Command names and flag formats have been standardized to use underscores (_) instead of hyphens (-) for consistency with Python naming conventions.
  • Codebase Refactoring and Cleanup: Significant portions of the old argparse-based CLI (src/inference_endpoint/cli.py, src/inference_endpoint/commands/benchmark.py, src/inference_endpoint/commands/eval.py, src/inference_endpoint/commands/utils.py, src/inference_endpoint/config/yaml_loader.py) have been removed and replaced with a more modular and maintainable structure using pydantic-settings models and dedicated command execution modules.
  • Asynchronous Utility: A new asynchronous runner utility (src/inference_endpoint/async_utils/runner.py) has been added, utilizing uvloop and eager_task_factory to efficiently execute async logic from synchronous CLI command handlers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and substantial refactoring of the CLI and configuration system. Migrating from argparse to pydantic-settings provides a much more robust, maintainable, and user-friendly foundation. Key improvements include a single source of truth for configuration via Pydantic models, self-validating configuration models, support for environment variable interpolation in YAML files, and greatly improved error reporting. The overall changes significantly enhance the quality and architecture of the project. I have a couple of minor suggestions for improvement.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the project’s CLI and configuration handling to pydantic-settings so the same Pydantic schema drives YAML configs, CLI flags, and validation, improving CLI/YAML parity while removing the legacy argparse-based implementation.

Changes:

  • Replace the legacy argparse CLI with a pydantic-settings CliApp tree and flat subcommand models that build a nested BenchmarkConfig.
  • Move YAML loading/validation responsibilities into BenchmarkConfig (including ${VAR} interpolation) and add a ConfigResolver for YAML + override merging.
  • Refactor commands and tests to use new typed configs and new benchmark execution module layout.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/unit/config/test_yaml_loader.py Updates YAML load tests to BenchmarkConfig.from_yaml_file(), adds env-var interpolation + roundtrip serialization coverage.
tests/unit/config/test_resolver.py Adds unit tests for ConfigResolver merging, CLI defaults, streaming resolution, and model-name fallback.
tests/unit/config/test_cli_app.py Adds unit tests for pydantic-settings CLI parsing across subcommands.
tests/unit/commands/test_utils.py Ports utils command tests to execute_info/init/validate functions and sync test style.
tests/unit/commands/test_eval.py Ports eval tests to new EvalConfig CLI model and stub behavior.
tests/unit/commands/test_benchmark.py Reworks benchmark tests around flat CLI models and schema-level validation.
tests/integration/commands/test_probe_command.py Ports probe integration tests to ProbeConfig + execute_probe.
tests/integration/commands/test_benchmark_command.py Ports benchmark integration tests to run_benchmark(config, mode) with minimal configs.
src/inference_endpoint/main.py Switches entrypoint to CliApp.run() with structured validation error reporting.
src/inference_endpoint/config/yaml_loader.py Removes legacy YAML loader wrapper and custom ConfigError.
src/inference_endpoint/config/schema.py Makes schema the source of truth; adds env-var interpolation, model validators, YAML IO, and consolidated validation logic.
src/inference_endpoint/config/runtime_settings.py Updates docstring reference from removed yaml_loader.
src/inference_endpoint/config/resolver.py Adds ConfigResolver for YAML load + deep merge of explicit overrides.
src/inference_endpoint/config/cli.py Adds new CliApp model tree and flat benchmark/utility subcommand models.
src/inference_endpoint/commands/validate.py Adds typed validate command wrapper using BenchmarkConfig.from_yaml_file().
src/inference_endpoint/commands/utils.py Removes legacy combined “utils” command module.
src/inference_endpoint/commands/probe.py Refactors probe into sync execute_probe() wrapper + async implementation executed via run_async.
src/inference_endpoint/commands/init.py Adds typed init command wrapper for generating/copying config templates.
src/inference_endpoint/commands/info.py Adds typed info command wrapper.
src/inference_endpoint/commands/eval.py Removes legacy eval implementation (now a stub CLI config raising NotImplementedError).
src/inference_endpoint/commands/benchmark/execute.py Introduces new benchmark execution module with setup/run/finalize phases.
src/inference_endpoint/commands/benchmark/init.py Exports run_benchmark from the new benchmark package.
src/inference_endpoint/commands/benchmark.py Removes legacy monolithic benchmark command implementation.
src/inference_endpoint/commands/init.py Simplifies commands package exports after refactor.
src/inference_endpoint/cli.py Removes legacy argparse CLI entrypoint.
src/inference_endpoint/async_utils/runner.py Adds shared run_async() helper (uvloop + eager task factory) for sync command wrappers.
pyproject.toml Adds pydantic-settings>=2.7 dependency for new CLI framework.
docs/CLI_QUICK_REFERENCE.md Updates CLI docs to the new pydantic-settings-based interface and features.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@viraatc viraatc changed the title feat: use pydantic-settings to enforce CLI/YAML parity feat: use cyclopts to enforce CLI/YAML parity Mar 20, 2026
Copilot AI review requested due to automatic review settings March 20, 2026 06:28
@viraatc viraatc requested a review from nv-alicheng March 20, 2026 06:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 27 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 20, 2026 23:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 20, 2026 23:54
viraatc added a commit that referenced this pull request Mar 21, 2026
- Add @pytest.mark.unit markers to new test_utils.py
- Fix resolver.py docstring: OfflineBenchmark/OnlineBenchmark -> OfflineConfig/OnlineConfig
- Fix runner.py docstring: Typer -> synchronous CLI
- Fix AGENTS.md CLI path: config/cli.py -> cli.py, commands/benchmark/cli.py
- Fix schema.py standalone triple-quoted string -> comment
- Add yaml.YAMLError to validate.py except clause
- Replace obscure SIGINT lambda with plain function in execute.py
- Fix CLI_QUICK_REFERENCE.md uppercase enum values
- Add SampleEventHandler.clear_hooks before register in execute.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 21, 2026 00:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

nvzhihanj

This comment was marked as spam.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
@mlcommons mlcommons deleted a comment from Copilot AI Mar 23, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/inference_endpoint/config/schema_utils.py:138

  • parse_dataset_string() flips parser from {target: source} to {source: target} via a dict comprehension, which will silently drop entries if the user maps multiple targets to the same source column (e.g. parser.prompt=text,parser.system=text). It would be better to detect duplicate source values before flipping and raise a clear ValueError so user intent isn't lost.
    # Validate parser remap targets (CLI only — YAML validated in factory)
    if "parser" in result and isinstance(result["parser"], dict):
        # Lazy import to avoid circular dep: schema_utils → dataset_manager → schema
        from inference_endpoint.dataset_manager.transforms import (
            MakeAdapterCompatible,
        )

        valid = set(MakeAdapterCompatible().remap.values())
        invalid = set(result["parser"].keys()) - valid
        if invalid:
            raise ValueError(
                f"Unknown parser remap target(s): {invalid}. "
                f"Valid targets: {sorted(valid)}"
            )

    return result


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants