-
Notifications
You must be signed in to change notification settings - Fork 21
Add JSON schema for test runner arguments #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add strong language to always use the `run_tests` tool. Add a JSON schema for the CLI arguments to the test runner. The schema was dumped using dart-lang/core#897 Add a test for the conversion of boolean, string, and list of string arguments, as well as overriding the reporter argument.
PR HealthChangelog Entry ✔️
Changes to files need to be accounted for in their respective changelogs. |
I haven't gotten an eval framework set up, but my manual testing was pretty positive with this. The agent was able to use the test runner more effectively, and I was unsuccessful in attempts to coach usage in other ways like a |
} | ||
|
||
const _dartTestCliSchema = ''' | ||
{"type":"object","properties":{"help":{"type":"boolean","description":"Show this usage information.\\ndefaults to \\\"false\\\""},"version":{"type":"boolean","description":"Show the package:test version.\\ndefaults to \\\"false\\\""},"name":{"type":"array","description":"A substring of the name of the test to run.\\nRegular expression syntax is supported.\\nIf passed multiple times, tests must match all substrings.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"plain-name":{"type":"array","description":"A plain-text substring of the name of the test to run.\\nIf passed multiple times, tests must match all substrings.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"tags":{"type":"array","description":"Run only tests with all of the specified tags.\\nSupports boolean selector syntax.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"exclude-tags":{"type":"array","description":"Don't run tests with any of the specified tags.\\nSupports boolean selector syntax.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"run-skipped":{"type":"boolean","description":"Run skipped tests instead of skipping them.\\ndefaults to \\\"false\\\""},"platform":{"type":"array","description":"The platform(s) on which to run the tests.\\n[vm (default), chrome, firefox, edge, node].\\nEach platform supports the following compilers:\\n[vm]: kernel (default), source, exe\\n[chrome]: dart2js (default), dart2wasm\\n[firefox]: dart2js (default), dart2wasm\\n[edge]: dart2js (default)\\n[node]: dart2js (default), dart2wasm\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"compiler":{"type":"array","description":"The compiler(s) to use to run tests, supported compilers are [dart2js, dart2wasm, exe, kernel, source].\\nEach platform has a default compiler but may support other compilers.\\nYou can target a compiler to a specific platform using arguments of the following form [<platform-selector>:]<compiler>.\\nIf a platform is specified but no given compiler is supported for that platform, then it will use its default compiler.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"preset":{"type":"array","description":"The configuration preset(s) to use.\\ndefaults to \\\"[]\\\"","items":{"type":"string"}},"concurrency":{"type":"string","description":"The number of concurrent test suites run.\\ndefaults to \\\"8\\\""},"total-shards":{"type":"string","description":"The total number of invocations of the test runner being run."},"shard-index":{"type":"string","description":"The index of this test runner invocation (of --total-shards)."},"timeout":{"type":"string","description":"The default test timeout. For example: 15s, 2x, none\\ndefaults to \\\"30s\\\""},"ignore-timeouts":{"type":"boolean","description":"Ignore all timeouts (useful if debugging)\\ndefaults to \\\"false\\\""},"pause-after-load":{"type":"boolean","description":"Pause for debugging before any tests execute.\\nImplies --concurrency=1, --debug, and --ignore-timeouts.\\nCurrently only supported for browser tests.\\ndefaults to \\\"false\\\""},"debug":{"type":"boolean","description":"Run the VM and Chrome tests in debug mode.\\ndefaults to \\\"false\\\""},"coverage":{"type":"string","description":"Gather coverage and output it to the specified directory.\\nImplies --debug."},"chain-stack-traces":{"type":"boolean","description":"Use chained stack traces to provide greater exception details\\nespecially for asynchronous code. It may be useful to disable\\nto provide improved test performance but at the cost of\\ndebuggability.\\ndefaults to \\\"false\\\""},"no-retry":{"type":"boolean","description":"Don't rerun tests that have retry set.\\ndefaults to \\\"false\\\""},"test-randomize-ordering-seed":{"type":"string","description":"Use the specified seed to randomize the execution order of test cases.\\nMust be a 32bit unsigned integer or \\\"random\\\".\\nIf \\\"random\\\", pick a random seed to use.\\nIf not passed, do not randomize test case execution order."},"fail-fast":{"type":"boolean","description":"Stop running tests after the first failure.\\n\\ndefaults to \\\"false\\\""},"reporter":{"type":"string","description":"Set how to print test results.\\ndefaults to \\\"compact\\\"\\nallowed values: compact, expanded, failures-only, github, json, silent"},"file-reporter":{"type":"string","description":"Enable an additional reporter writing test results to a file.\\nShould be in the form <reporter>:<filepath>, Example: \\\"json:reports/tests.json\\\""},"verbose-trace":{"type":"boolean","description":"Emit stack traces with core library frames.\\ndefaults to \\\"false\\\""},"js-trace":{"type":"boolean","description":"Emit raw JavaScript stack traces for browser tests.\\ndefaults to \\\"false\\\""},"color":{"type":"boolean","description":"Use terminal colors.\\n(auto-detected by default)\\ndefaults to \\\"false\\\""}},"required":[]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a String, how are we intended to update it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also not sure many of these make sense to expose and are just a waste of tokens and will make overall calls less reliable?
For instance color
, help
, file-reporter
, pause-after-load
, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a String, how are we intended to update it?
We'd dump it from the test runner binary again and replace the entire String. We can change this away from a String, it will add steps to update, but I don't think that's a problem. Would you prefer a map literal?
I am also not sure many of these make sense to expose and are just a waste of tokens and will make overall calls less reliable?
For instance
color
,help
,file-reporter
,pause-after-load
, etc?
I don't think it makes sense to curate the whole list manually, but defining a blocklist is feasible. I'd worry about forgetting to add an argument sometime and leaving a gap where the client needs to use the shell even though we said not to. WDYT about a blocklist defined here? If a blocklist defined by the test runner is better we'd need to update the package:args
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think file-reporter
has plausible utility for a client, I added a blocklist with the others.
For the overall token usage I'm partially running on a hunch that some sort of tool filtering is likely to become a standard feature for clients. There is a huge number of tools being created and a number of discussions started about managing the volume of tools. I started exploring tool filtering by making a call with less historical context to decide which tools should offered but I didn't figure out how to evaluate the behavior impacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocklist sounds fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added instructions for updating this. It currently relies on unmerged PRs in package:args
and package:test
. I'd be fine seeing these land if someone wants to tackle it before I'm back, otherwise I think it's OK if these sit as drafts just in case they are needed and I'll revisit whether it's worth landing in a few weeks.
dart-lang/test#2508
Some CLI arguments don't make sense through this proxy
Revisions updated by `dart tools/rev_sdk_deps.dart`. ai (https://github.com/dart-lang/ai/compare/f2b48c6..12ac0a4): 12ac0a4 2025-06-18 Jacob MacDonald make the log file test less flaky by retrying deletes and reads (dart-lang/ai#180) dbedc6d 2025-06-18 Jacob MacDonald fix cruft in description (dart-lang/ai#179) 3ab9482 2025-06-18 Jacob MacDonald improve tool description for the dtd connection tool and improve error messages (dart-lang/ai#178) 4ca0ff1 2025-06-18 Jacob MacDonald Use shared Implementation type, add clientInfo field to MCPServer (dart-lang/ai#175) 885a4c5 2025-06-18 Jacob MacDonald Add `--log-file` argument to log all protocol traffic to a file (dart-lang/ai#176) 7ca3eba 2025-06-17 Nate Bosch Add JSON schema for test runner arguments (dart-lang/ai#169) core (https://github.com/dart-lang/core/compare/dc97530..b59ecf4): b59ecf4c 2025-06-18 Lasse R.H. Nielsen Optimize surrogate decoding. (dart-lang/core#894) dartdoc (https://github.com/dart-lang/dartdoc/compare/4ceea6b..f1fe177): f1fe1775 2025-06-16 Sarah Zakarias Refactor 404 error page to use div instead of p for search form (dart-lang/dartdoc#4064) ecosystem (https://github.com/dart-lang/ecosystem/compare/64aac3a..d5233c6): d5233c6 2025-06-13 dependabot[bot] Bump the github-actions group with 5 updates (dart-lang/ecosystem#351) web (https://github.com/dart-lang/web/compare/c8c1c28..4b2f02e): 4b2f02e 2025-06-18 nikeokoronkwo Add Variable Declaration Support (dart-lang/web#382) webdev (https://github.com/dart-lang/webdev/compare/661dafd..6dc3dde): 6dc3ddef 2025-06-20 Jessy Yameogo Fix duplicate connection/logs in Webdev (dart-lang/webdev#2635) 0c8a17b4 2025-06-20 Morgan :) Remove dependency overrides. (dart-lang/webdev#2634) a3218638 2025-06-16 Jessy Yameogo modifying DWDS Injector to always inject client and introduce useDwdsWebSocketConnection flag (dart-lang/webdev#2629) 2eb27546 2025-06-16 Morgan :) Prepare for `build_runner` changes. (dart-lang/webdev#2633) Change-Id: Ib323bea37dd77ed94387e77d9c504f889bfa8050 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/436021 Auto-Submit: Devon Carew <devoncarew@google.com> Reviewed-by: Konstantin Shcheglov <scheglov@google.com> Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
Add strong language to always use the
run_tests
tool.Add a JSON schema for the CLI arguments to the test runner. The schema
was dumped using dart-lang/core#897
Add a test for the conversion of boolean, string, and list of string
arguments, as well as overriding the reporter argument.