Skip to content

Add JSON schema for test runner arguments #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 17, 2025
Merged

Conversation

natebosch
Copy link
Member

Add strong language to always use the run_tests tool.

Add a JSON schema for the CLI arguments to the test runner. The schema
was dumped using dart-lang/core#897

Add a test for the conversion of boolean, string, and list of string
arguments, as well as overriding the reporter argument.

Add strong language to always use the `run_tests` tool.

Add a JSON schema for the CLI arguments to the test runner. The schema
was dumped using dart-lang/core#897

Add a test for the conversion of boolean, string, and list of string
arguments, as well as overriding the reporter argument.
Copy link

PR Health

Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

@natebosch
Copy link
Member Author

I haven't gotten an eval framework set up, but my manual testing was pretty positive with this. The agent was able to use the test runner more effectively, and I was unsuccessful in attempts to coach usage in other ways like a getToolInfo tool.

}

const _dartTestCliSchema = '''
{"type":"object","properties":{"help":{"type":"boolean","description":"Show this usage information.\\ndefaults to \\\"false\\\""},"version":{"type":"boolean","description":"Show the package:test version.\\ndefaults to \\\"false\\\""},"name":{"type":"array","description":"A substring of the name of the test to run.\\nRegular expression syntax is supported.\\nIf passed multiple times, tests must match all substrings.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"plain-name":{"type":"array","description":"A plain-text substring of the name of the test to run.\\nIf passed multiple times, tests must match all substrings.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"tags":{"type":"array","description":"Run only tests with all of the specified tags.\\nSupports boolean selector syntax.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"exclude-tags":{"type":"array","description":"Don't run tests with any of the specified tags.\\nSupports boolean selector syntax.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"run-skipped":{"type":"boolean","description":"Run skipped tests instead of skipping them.\\ndefaults to \\\"false\\\""},"platform":{"type":"array","description":"The platform(s) on which to run the tests.\\n[vm (default), chrome, firefox, edge, node].\\nEach platform supports the following compilers:\\n[vm]: kernel (default), source, exe\\n[chrome]: dart2js (default), dart2wasm\\n[firefox]: dart2js (default), dart2wasm\\n[edge]: dart2js (default)\\n[node]: dart2js (default), dart2wasm\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"compiler":{"type":"array","description":"The compiler(s) to use to run tests, supported compilers are [dart2js, dart2wasm, exe, kernel, source].\\nEach platform has a default compiler but may support other compilers.\\nYou can target a compiler to a specific platform using arguments of the following form [<platform-selector>:]<compiler>.\\nIf a platform is specified but no given compiler is supported for that platform, then it will use its default compiler.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"preset":{"type":"array","description":"The configuration preset(s) to use.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"concurrency":{"type":"string","description":"The number of concurrent test suites run.\\ndefaults to \\\"8\\\""},"total-shards":{"type":"string","description":"The total number of invocations of the test runner being run."},"shard-index":{"type":"string","description":"The index of this test runner invocation (of --total-shards)."},"timeout":{"type":"string","description":"The default test timeout. For example: 15s, 2x, none\\ndefaults to \\\"30s\\\""},"ignore-timeouts":{"type":"boolean","description":"Ignore all timeouts (useful if debugging)\\ndefaults to \\\"false\\\""},"pause-after-load":{"type":"boolean","description":"Pause for debugging before any tests execute.\\nImplies --concurrency=1, --debug, and --ignore-timeouts.\\nCurrently only supported for browser tests.\\ndefaults to \\\"false\\\""},"debug":{"type":"boolean","description":"Run the VM and Chrome tests in debug mode.\\ndefaults to \\\"false\\\""},"coverage":{"type":"string","description":"Gather coverage and output it to the specified directory.\\nImplies --debug."},"chain-stack-traces":{"type":"boolean","description":"Use chained stack traces to provide greater exception details\\nespecially for asynchronous code. It may be useful to disable\\nto provide improved test performance but at the cost of\\ndebuggability.\\ndefaults to \\\"false\\\""},"no-retry":{"type":"boolean","description":"Don't rerun tests that have retry set.\\ndefaults to \\\"false\\\""},"test-randomize-ordering-seed":{"type":"string","description":"Use the specified seed to randomize the execution order of test cases.\\nMust be a 32bit unsigned integer or \\\"random\\\".\\nIf \\\"random\\\", pick a random seed to use.\\nIf not passed, do not randomize test case execution order."},"fail-fast":{"type":"boolean","description":"Stop running tests after the first failure.\\n\\ndefaults to \\\"false\\\""},"reporter":{"type":"string","description":"Set how to print test results.\\ndefaults to \\\"compact\\\"\\nallowed values: compact, expanded, failures-only, github, json, silent"},"file-reporter":{"type":"string","description":"Enable an additional reporter writing test results to a file.\\nShould be in the form <reporter>:<filepath>, Example: \\\"json:reports/tests.json\\\""},"verbose-trace":{"type":"boolean","description":"Emit stack traces with core library frames.\\ndefaults to \\\"false\\\""},"js-trace":{"type":"boolean","description":"Emit raw JavaScript stack traces for browser tests.\\ndefaults to \\\"false\\\""},"color":{"type":"boolean","description":"Use terminal colors.\\n(auto-detected by default)\\ndefaults to \\\"false\\\""}},"required":[]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a String, how are we intended to update it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure many of these make sense to expose and are just a waste of tokens and will make overall calls less reliable?

For instance color, help, file-reporter, pause-after-load, etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a String, how are we intended to update it?

We'd dump it from the test runner binary again and replace the entire String. We can change this away from a String, it will add steps to update, but I don't think that's a problem. Would you prefer a map literal?

I am also not sure many of these make sense to expose and are just a waste of tokens and will make overall calls less reliable?

For instance color, help, file-reporter, pause-after-load, etc?

I don't think it makes sense to curate the whole list manually, but defining a blocklist is feasible. I'd worry about forgetting to add an argument sometime and leaving a gap where the client needs to use the shell even though we said not to. WDYT about a blocklist defined here? If a blocklist defined by the test runner is better we'd need to update the package:args API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think file-reporter has plausible utility for a client, I added a blocklist with the others.

For the overall token usage I'm partially running on a hunch that some sort of tool filtering is likely to become a standard feature for clients. There is a huge number of tools being created and a number of discussions started about managing the volume of tools. I started exploring tool filtering by making a call with less historical context to decide which tools should offered but I didn't figure out how to evaluate the behavior impacts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocklist sounds fine to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added instructions for updating this. It currently relies on unmerged PRs in package:args and package:test. I'd be fine seeing these land if someone wants to tackle it before I'm back, otherwise I think it's OK if these sit as drafts just in case they are needed and I'll revisit whether it's worth landing in a few weeks.
dart-lang/test#2508

Some CLI arguments don't make sense through this proxy
@jakemac53 jakemac53 merged commit 7ca3eba into main Jun 17, 2025
15 checks passed
@jakemac53 jakemac53 deleted the dart-test-cli-schema branch June 17, 2025 21:56
copybara-service bot pushed a commit to dart-lang/sdk that referenced this pull request Jun 23, 2025
Revisions updated by `dart tools/rev_sdk_deps.dart`.

ai (https://github.com/dart-lang/ai/compare/f2b48c6..12ac0a4):
  12ac0a4  2025-06-18  Jacob MacDonald  make the log file test less flaky by retrying deletes and reads (dart-lang/ai#180)
  dbedc6d  2025-06-18  Jacob MacDonald  fix cruft in description (dart-lang/ai#179)
  3ab9482  2025-06-18  Jacob MacDonald  improve tool description for the dtd connection tool and improve error messages (dart-lang/ai#178)
  4ca0ff1  2025-06-18  Jacob MacDonald  Use shared Implementation type, add clientInfo field to MCPServer (dart-lang/ai#175)
  885a4c5  2025-06-18  Jacob MacDonald   Add `--log-file` argument to log all protocol traffic to a file (dart-lang/ai#176)
  7ca3eba  2025-06-17  Nate Bosch  Add JSON schema for test runner arguments (dart-lang/ai#169)

core (https://github.com/dart-lang/core/compare/dc97530..b59ecf4):
  b59ecf4c  2025-06-18  Lasse R.H. Nielsen  Optimize surrogate decoding. (dart-lang/core#894)

dartdoc (https://github.com/dart-lang/dartdoc/compare/4ceea6b..f1fe177):
  f1fe1775  2025-06-16  Sarah Zakarias  Refactor 404 error page to use div instead of p for search form (dart-lang/dartdoc#4064)

ecosystem (https://github.com/dart-lang/ecosystem/compare/64aac3a..d5233c6):
  d5233c6  2025-06-13  dependabot[bot]  Bump the github-actions group with 5 updates (dart-lang/ecosystem#351)

web (https://github.com/dart-lang/web/compare/c8c1c28..4b2f02e):
  4b2f02e  2025-06-18  nikeokoronkwo  Add Variable Declaration Support (dart-lang/web#382)

webdev (https://github.com/dart-lang/webdev/compare/661dafd..6dc3dde):
  6dc3ddef  2025-06-20  Jessy Yameogo  Fix duplicate connection/logs in Webdev (dart-lang/webdev#2635)
  0c8a17b4  2025-06-20  Morgan :)  Remove dependency overrides. (dart-lang/webdev#2634)
  a3218638  2025-06-16  Jessy Yameogo  modifying DWDS Injector to always inject client and introduce useDwdsWebSocketConnection flag (dart-lang/webdev#2629)
  2eb27546  2025-06-16  Morgan :)  Prepare for `build_runner` changes. (dart-lang/webdev#2633)

Change-Id: Ib323bea37dd77ed94387e77d9c504f889bfa8050
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/436021
Auto-Submit: Devon Carew <devoncarew@google.com>
Reviewed-by: Konstantin Shcheglov <scheglov@google.com>
Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants