-
Notifications
You must be signed in to change notification settings - Fork 8
Expand smoke suite to full command coverage #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
c4f524c
Add smoke helper functions for full coverage
jeremy 3ea75ee
Deepen existing smoke tests and fix coverage accounting
jeremy fccbf25
Add full CRUD depth to Level 1 mutation tests
jeremy 39a78aa
Add Level 0 smoke tests for communication, checkins, schedule, and more
jeremy cf38c4a
Add Level 1 smoke tests for campfire, assign, lineup, and more
jeremy 4847443
Add Level 2 smoke test for project CRUD lifecycle
jeremy 374e97f
Update smoke orchestrator levels and fix qa-critic skill
jeremy 2d8d1e9
Address PR review findings from automated reviewers
jeremy b2fa765
Harden smoke suite for live dev testing
jeremy d34e684
Address PR review feedback on smoke suite
jeremy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,60 @@ | ||
| --- | ||
| description: Grade smoke test traces against the QA rubric and produce prioritized findings | ||
| user_invocable: true | ||
| --- | ||
|
|
||
| # QA Critic | ||
|
|
||
| Analyze smoke test traces and grade CLI output quality against the rubric. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Smoke traces must exist. Run `make smoke` first to generate them: | ||
|
|
||
| ```bash | ||
| BASECAMP_PROFILE=dev make smoke | ||
| # or: BASECAMP_TOKEN=<token> make smoke | ||
| ``` | ||
|
|
||
| Traces land in `tmp/qa-traces/traces.jsonl` (or `$QA_TRACE_DIR`). | ||
|
|
||
| ## Steps | ||
|
|
||
| 1. **Read the rubric**: Read `e2e/smoke/RUBRIC.md` for the grading dimensions. | ||
|
|
||
| 2. **Read results**: Two sources, each authoritative for different things: | ||
| - **BATS TAP output** (stdout from `make smoke`): Parse TAP lines to count pass (`ok ...`), fail (`not ok ...`), and skip (`ok ... # skip ...`). These are the ground truth for pass/fail. | ||
| - **Trace file** (`tmp/qa-traces/traces.jsonl`): Each line is a JSON object with fields: `test`, `command`, `exit_code`, `status`, `reason`. Traces record only gap/exclusion metadata — `unverifiable` (test could not verify due to missing data) and `out-of-scope` (intentionally excluded). Traces say nothing about pass/fail; use them only for coverage-gap analysis. | ||
|
|
||
| 3. **Identify coverage gaps**: List all commands from the `.surface` file (lines starting with `CMD`). Cross-reference against the BATS test inventory (grep `@test` lines across `e2e/smoke/*.bats` and match the command name in each `run_smoke basecamp <command>` call). A command is covered if at least one `@test` exercises it. Traces are not useful here — passing tests leave no trace entry, so a pure-pass command group would be misclassified as uncovered. | ||
|
|
||
| 4. **Run sample commands**: For each covered command group, run 2-3 representative commands with `--json` and without `--json` to capture both machine and human output. Evaluate against both v0 and v1 rubric dimensions. | ||
|
|
||
| 5. **Grade v0 (automatable)**: For each command tested: | ||
| - **Functional**: Did it exit 0 with `ok: true`? | ||
| - **Non-empty**: Is `.data` present and non-null? | ||
| - **Correct types**: Are IDs numbers, names strings? | ||
| - **Summary present**: Is `.summary` a non-empty string? | ||
| - **Scriptable**: Does `--json` parse cleanly? Does `--ids-only` work where applicable? | ||
|
|
||
| 6. **Grade v1 (critic-evaluated)**: For each command tested: | ||
| - **Readable**: Is the human output scannable, not a wall of text? | ||
| - **Discoverable**: Do breadcrumbs suggest logical next actions? | ||
| - **Consistent**: Do similar commands (e.g., all `list` commands) produce similar output shapes? | ||
| - **Helpful errors**: Run with bad input — does the error explain what's wrong and how to fix it? | ||
| - **Complete**: Are all relevant API fields surfaced? | ||
|
|
||
| 7. **Produce findings**: Output a prioritized list of issues, grouped by severity: | ||
| - **Critical**: Command exits non-zero, crashes, or returns malformed JSON | ||
| - **High**: Missing `.summary`, empty `.data` when data exists, no breadcrumbs | ||
| - **Medium**: Inconsistent output shapes, missing fields vs API, poor error messages | ||
| - **Low**: Style/readability nits, missing `--ids-only` support | ||
|
|
||
| Format each finding as: | ||
| ``` | ||
| [SEVERITY] command: description | ||
| Evidence: <what you observed> | ||
| Expected: <what the rubric requires> | ||
| ``` | ||
|
|
||
| 8. **Summary table**: End with a coverage matrix showing each command group, its test count, and a letter grade (A-F) based on v0+v1 scores. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| #!/usr/bin/env bats | ||
| # smoke_assign.bats - Level 1: Assign and unassign operations | ||
|
|
||
| load smoke_helper | ||
|
|
||
| setup_file() { | ||
| ensure_token || return 1 | ||
| ensure_project || return 1 | ||
| ensure_todolist || return 1 | ||
| } | ||
|
|
||
| @test "assign assigns a person to a todo" { | ||
| # Create a fresh todo for assignment | ||
| local todo_out | ||
| todo_out=$(basecamp todo "Assign target $(date +%s)" --list "$QA_TODOLIST" -p "$QA_PROJECT" --json 2>/dev/null) || { | ||
| mark_unverifiable "Cannot create todo for assign test" | ||
| return | ||
| } | ||
| local todo_id | ||
| todo_id=$(echo "$todo_out" | jq -r '.data.id // empty') | ||
| [[ -n "$todo_id" ]] || mark_unverifiable "No todo ID returned" | ||
|
|
||
| echo "$todo_id" > "$BATS_FILE_TMPDIR/assign_todo_id" | ||
|
|
||
| run_smoke basecamp assign "$todo_id" --to me -p "$QA_PROJECT" --json | ||
| assert_success | ||
| assert_json_value '.ok' 'true' | ||
| } | ||
|
|
||
| @test "unassign removes a person from a todo" { | ||
| local id_file="$BATS_FILE_TMPDIR/assign_todo_id" | ||
| [[ -f "$id_file" ]] || mark_unverifiable "No todo created in prior test" | ||
| local todo_id | ||
| todo_id=$(<"$id_file") | ||
|
|
||
| run_smoke basecamp unassign "$todo_id" --from me -p "$QA_PROJECT" --json | ||
| assert_success | ||
| assert_json_value '.ok' 'true' | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| #!/usr/bin/env bats | ||
| # smoke_campfire.bats - Level 0/1: Campfire (chat) operations | ||
|
|
||
| load smoke_helper | ||
|
|
||
| setup_file() { | ||
| ensure_token || return 1 | ||
| ensure_project || return 1 | ||
| ensure_campfire || return 1 | ||
| } | ||
|
|
||
| @test "campfire list returns campfires" { | ||
| run_smoke basecamp campfire list -p "$QA_PROJECT" --json | ||
| assert_success | ||
| assert_json_value '.ok' 'true' | ||
| } | ||
|
|
||
| @test "campfire messages returns lines" { | ||
| run_smoke basecamp campfire messages --chat "$QA_CAMPFIRE" -p "$QA_PROJECT" --json | ||
| assert_success | ||
| assert_json_value '.ok' 'true' | ||
| } | ||
|
|
||
| @test "campfire post creates a message" { | ||
| run_smoke basecamp campfire post "Smoke test $(date +%s)" \ | ||
| --chat "$QA_CAMPFIRE" -p "$QA_PROJECT" --json | ||
| assert_success | ||
| assert_json_value '.ok' 'true' | ||
| assert_json_not_null '.data.id' | ||
|
|
||
| echo "$output" | jq -r '.data.id' > "$BATS_FILE_TMPDIR/campfire_line_id" | ||
| } | ||
|
|
||
| @test "campfire line shows a message" { | ||
| local id_file="$BATS_FILE_TMPDIR/campfire_line_id" | ||
| [[ -f "$id_file" ]] || mark_unverifiable "No campfire line created in prior test" | ||
| local line_id | ||
| line_id=$(<"$id_file") | ||
|
|
||
| run_smoke basecamp campfire line "$line_id" \ | ||
| --chat "$QA_CAMPFIRE" -p "$QA_PROJECT" --json | ||
| assert_success | ||
| assert_json_value '.ok' 'true' | ||
| assert_json_not_null '.data.id' | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.