feat: use cyclopts to enforce CLI/YAML parity#193
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a fundamental architectural shift in how the CLI and configuration are handled, migrating from a custom Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This is an excellent and substantial refactoring of the CLI and configuration system. Migrating from argparse to pydantic-settings provides a much more robust, maintainable, and user-friendly foundation. Key improvements include a single source of truth for configuration via Pydantic models, self-validating configuration models, support for environment variable interpolation in YAML files, and greatly improved error reporting. The overall changes significantly enhance the quality and architecture of the project. I have a couple of minor suggestions for improvement.
There was a problem hiding this comment.
Pull request overview
This PR migrates the project’s CLI and configuration handling to pydantic-settings so the same Pydantic schema drives YAML configs, CLI flags, and validation, improving CLI/YAML parity while removing the legacy argparse-based implementation.
Changes:
- Replace the legacy argparse CLI with a
pydantic-settingsCliApptree and flat subcommand models that build a nestedBenchmarkConfig. - Move YAML loading/validation responsibilities into
BenchmarkConfig(including${VAR}interpolation) and add aConfigResolverfor YAML + override merging. - Refactor commands and tests to use new typed configs and new benchmark execution module layout.
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/config/test_yaml_loader.py | Updates YAML load tests to BenchmarkConfig.from_yaml_file(), adds env-var interpolation + roundtrip serialization coverage. |
| tests/unit/config/test_resolver.py | Adds unit tests for ConfigResolver merging, CLI defaults, streaming resolution, and model-name fallback. |
| tests/unit/config/test_cli_app.py | Adds unit tests for pydantic-settings CLI parsing across subcommands. |
| tests/unit/commands/test_utils.py | Ports utils command tests to execute_info/init/validate functions and sync test style. |
| tests/unit/commands/test_eval.py | Ports eval tests to new EvalConfig CLI model and stub behavior. |
| tests/unit/commands/test_benchmark.py | Reworks benchmark tests around flat CLI models and schema-level validation. |
| tests/integration/commands/test_probe_command.py | Ports probe integration tests to ProbeConfig + execute_probe. |
| tests/integration/commands/test_benchmark_command.py | Ports benchmark integration tests to run_benchmark(config, mode) with minimal configs. |
| src/inference_endpoint/main.py | Switches entrypoint to CliApp.run() with structured validation error reporting. |
| src/inference_endpoint/config/yaml_loader.py | Removes legacy YAML loader wrapper and custom ConfigError. |
| src/inference_endpoint/config/schema.py | Makes schema the source of truth; adds env-var interpolation, model validators, YAML IO, and consolidated validation logic. |
| src/inference_endpoint/config/runtime_settings.py | Updates docstring reference from removed yaml_loader. |
| src/inference_endpoint/config/resolver.py | Adds ConfigResolver for YAML load + deep merge of explicit overrides. |
| src/inference_endpoint/config/cli.py | Adds new CliApp model tree and flat benchmark/utility subcommand models. |
| src/inference_endpoint/commands/validate.py | Adds typed validate command wrapper using BenchmarkConfig.from_yaml_file(). |
| src/inference_endpoint/commands/utils.py | Removes legacy combined “utils” command module. |
| src/inference_endpoint/commands/probe.py | Refactors probe into sync execute_probe() wrapper + async implementation executed via run_async. |
| src/inference_endpoint/commands/init.py | Adds typed init command wrapper for generating/copying config templates. |
| src/inference_endpoint/commands/info.py | Adds typed info command wrapper. |
| src/inference_endpoint/commands/eval.py | Removes legacy eval implementation (now a stub CLI config raising NotImplementedError). |
| src/inference_endpoint/commands/benchmark/execute.py | Introduces new benchmark execution module with setup/run/finalize phases. |
| src/inference_endpoint/commands/benchmark/init.py | Exports run_benchmark from the new benchmark package. |
| src/inference_endpoint/commands/benchmark.py | Removes legacy monolithic benchmark command implementation. |
| src/inference_endpoint/commands/init.py | Simplifies commands package exports after refactor. |
| src/inference_endpoint/cli.py | Removes legacy argparse CLI entrypoint. |
| src/inference_endpoint/async_utils/runner.py | Adds shared run_async() helper (uvloop + eager task factory) for sync command wrappers. |
| pyproject.toml | Adds pydantic-settings>=2.7 dependency for new CLI framework. |
| docs/CLI_QUICK_REFERENCE.md | Updates CLI docs to the new pydantic-settings-based interface and features. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add @pytest.mark.unit markers to new test_utils.py - Fix resolver.py docstring: OfflineBenchmark/OnlineBenchmark -> OfflineConfig/OnlineConfig - Fix runner.py docstring: Typer -> synchronous CLI - Fix AGENTS.md CLI path: config/cli.py -> cli.py, commands/benchmark/cli.py - Fix schema.py standalone triple-quoted string -> comment - Add yaml.YAMLError to validate.py except clause - Replace obscure SIGINT lambda with plain function in execute.py - Fix CLI_QUICK_REFERENCE.md uppercase enum values - Add SampleEventHandler.clear_hooks before register in execute.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 34 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 34 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 36 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/inference_endpoint/config/schema_utils.py:138
parse_dataset_string()flipsparserfrom{target: source}to{source: target}via a dict comprehension, which will silently drop entries if the user maps multiple targets to the same source column (e.g.parser.prompt=text,parser.system=text). It would be better to detect duplicate source values before flipping and raise a clearValueErrorso user intent isn't lost.
# Validate parser remap targets (CLI only — YAML validated in factory)
if "parser" in result and isinstance(result["parser"], dict):
# Lazy import to avoid circular dep: schema_utils → dataset_manager → schema
from inference_endpoint.dataset_manager.transforms import (
MakeAdapterCompatible,
)
valid = set(MakeAdapterCompatible().remap.values())
invalid = set(result["parser"].keys()) - valid
if invalid:
raise ValueError(
f"Unknown parser remap target(s): {invalid}. "
f"Valid targets: {sorted(valid)}"
)
return result
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 40 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
Replace argparse CLI with cyclopts.
config/schema.pyPydantic models are the single source of truth — CLI flags are auto-generated from schema fields. Add a shorthand by adding onecyclopts.Parameter(alias=...)annotation, zero CLI code changes.Architecture
OfflineBenchmarkConfig/OnlineBenchmarkConfigsubclasses inconfig/schema.py. Both CLI and YAML auto-select the right subclass via PydanticTypeAdapterwithDiscriminatorontypefield.--dataset [perf|acc:]<path>[,key=value...]— TOML-style dotted paths for samples, format, column remaps, and accuracy config.RuntimeConfigvalidates durations,LoadPatternvalidates completeness (poisson needs qps, concurrency needs target).BenchmarkConfig._resolve_and_validateonly handles cross-model checks.with_updates(): Unified config reconstruction helper that re-runs all validators.Usage
Changes
Config Architecture
from_yaml_file()uses PydanticTypeAdapterto auto-selectOfflineBenchmarkConfig/OnlineBenchmarkConfigfromtypefieldcli.py— subclasses now live inschema.pyOnlineSettingsis a real class (was an alias)show=Falseonsubmission_ref/benchmark_modemoved to base classcreate_default_configdelegates to subclasses (no hardcodedworkers=4)with_updates()helper replaces manualmodel_validate(dump|dict)patternDataset String Format
[perf|acc:]<path>[,key=value...](TOML-style dotted paths)parser.prompt=article,accuracy_config.eval_method=pass_at_1extra="forbid"on Dataset/AccuracyConfig catches typosMakeAdapterCompatible(no manual list)BeforeValidatorondatasetsfield coerces strings — no post-construction rebuild{target: source}— flipped inparse_dataset_stringforColumnRemapValidation
RuntimeConfigself-validatesmax_duration_ms >= min_duration_msLoadPatternself-validates completeness (poisson needs qps, concurrency needs target)ColumnRemapraisesKeyErroron missing source columns (waserrors="ignore")MakeAdapterCompatibleuses tuple key forsystem_prompt(optional, not strict)DatasetValidationErrorfor clean --dataset error messages-vvflag dumps fullBenchmarkConfigas JSON before executionFile Structure
config/schema_utils.py— extracted env var interpolation + dataset string parsingcommands/benchmark/cli.py— simplified (no wrapper classes, no_parse_dataset_args)docs/CLI_DESIGN.md— fully rewritten for unified config, dataset format, validation layersdocs/CLI_QUICK_REFERENCE.md— updated dataset format, accuracy CLI examplesAGENTS.md— updated class hierarchy and construction flowTests
ColumnRemaptest updated for strict source column validationType of change
Testing