Skip to content

Add language-agnostic emitter-diff tool + python adapter#11122

Draft
l0lawrence wants to merge 27 commits into
mainfrom
l0lawrence-emitter-diff
Draft

Add language-agnostic emitter-diff tool + python adapter#11122
l0lawrence wants to merge 27 commits into
mainfrom
l0lawrence-emitter-diff

Conversation

@l0lawrence

@l0lawrence l0lawrence commented Jun 30, 2026

Copy link
Copy Markdown
Member

Draft / RFC. Adds the language-agnostic emitter-diff tool (built in Azure/typespec-azure) to microsoft/typespec. CI runs end-to-end; the PR-comment + approved-baseline gate are verified on this PR.

What

A language-agnostic tool, eng/emitter-diff (@typespec/emitter-diff), that diffs the generated code produced by two versions of a TypeSpec emitter and optionally runs the emitter's generated-code test suites. It resolves a baseline emitter from an npm version, a local folder, or a GitHub url/sha, generates with both baseline and head, and renders the diff as a clickable HTML report (default), a VS Code diff, or a terminal patch.

Ships the python adapter for @typespec/http-client-python, which drives the package's existing regenerate.ts two-phase native pipeline (TypeSpec → YAML, then a batched Python subprocess), building and creating a venv per emitter version.

Changes

  • eng/emitter-diff/ — the tool: CLI/orchestrator, ref resolver (npm/local/github), generic EmitterAdapter interface + registry, diff engine (HTML/VS Code/terminal), python adapter. Baseline and head generate in parallel by default (prefixed logs; --sequential to opt out).
  • packages/http-client-python/eng/scripts/ci/regenerate.ts — adds --httpSpecsDir, --azureSpecsDir, --no-baseline so the tool can pin specs and skip the published-baseline clone; and makes the batched Python phase honor a custom --generatedFolder. Defaults preserve current behavior.
  • pnpm-workspace.yaml — adds eng/emitter-diff as a workspace member and diff2html to the catalog (lockfile updated).
  • .github/workflows/ci-emitter-diff-python.yml — PR workflow with an approved-baseline gate.

Approved-baseline gate

The approved generated-output baseline is a commit SHA in eng/emitter-diff/baselines/python.sha. CI builds the emitter for both the PR checkout and a worktree of that commit, diffs them, uploads the HTML artifact, and posts a sticky PR comment. If the output differs, the run exits 2 and the job fails. To approve an intended change, update the SHA to a commit on your branch containing your emitter changes and push — once the baseline matches head, the diff is empty and the check passes.

Notes / open questions

  • Python's native pipeline needs a venv per emitter version, so the adapter builds + npm run installs each side. --use-pyodide from the azure variant is dropped (no pyodide path in core).
  • The tool is designed to host future c#/java/go/rust adapters with no core changes.
  • Companion PR in Azure/typespec-azure (Support generating Python SDK with Pyodide #4784).

Supersedes the fork-based draft #11119 (closed); moved here so the CI PR-comment + gate run with a read-write token.

l0lawrence and others added 7 commits June 30, 2026 13:22
Ports the eng/emitter-diff tool: diffs generated code between two emitter
versions (npm version, local folder, or github ref) and optionally runs the
emitter's generated-code test suites. Ships the python adapter for
@typespec/http-client-python, driving its native two-phase regenerate pipeline
with a venv per emitter version.

- eng/emitter-diff: @typespec/emitter-diff package (CLI, ref resolver, diff
  engine with HTML/VS Code/terminal output, adapter registry).
- regenerate.ts: add --httpSpecsDir, --azureSpecsDir, --no-baseline flags.
- pnpm-workspace.yaml: add eng/emitter-diff member + diff2html catalog entry.
- ci-emitter-diff-python.yml: PR workflow that diffs head vs the PR base commit,
  uploads the rendered HTML artifact, and posts a sticky PR comment.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
CI now diffs the PR's emitter against the emitter at the commit recorded in
eng/emitter-diff/baselines/python.sha and fails when generated output differs.
To approve an intended change, update that SHA to a commit on the branch that
contains the emitter changes and push; once the baseline matches head the diff
is empty and the check passes.

- cli.ts: --fail-on-diff now exits 2 for "diff present" (vs 1 for hard errors)
  so CI can distinguish an unapproved diff from a failure.
- baselines/python.sha: the approved baseline commit.
- ci-emitter-diff-python.yml: read the SHA file as baseline, run with
  --fail-on-diff, enforce the gate, and explain approval in the summary/comment.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Use a non-literal import specifier + local type so a typecheck that doesn't
install this package's deps (e.g. the parent repo's check:eng, which includes
core/eng) doesn't fail with TS2307; availability is validated at runtime.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
run_batch.py was always passed PLUGIN_DIR as --generated-dir, so when the
emitter-diff tool redirects output via --generatedFolder the batch phase looked
in the wrong tree, found no .tsp-codegen config files, and wrote zero .py files
(leaving the path-bearing config files behind as spurious diffs). Pass the
parent of GENERATED_FOLDER instead; it equals PLUGIN_DIR in the default case so
normal regeneration is unchanged.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
The two sides write to isolated output dirs, use separate venvs, and emit
uniquely-named temp YAML, so they can regenerate concurrently. Run them with
Promise.all and tag each side's streamed output with a [baseline]/[head] prefix
so the interleaved logs stay attributable. Add --sequential to opt back into
one-at-a-time generation (quieter logs, lower peak CPU/memory).

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Temporarily set the approved baseline to #10947 (timedelta duration encoding)
so the diff PR comment and approval gate can be exercised end-to-end. Revert to
the blessed HEAD SHA after the demo.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@microsoft-github-policy-service microsoft-github-policy-service Bot added emitter:client:python Issue for the Python client emitter: @typespec/http-client-python eng labels Jun 30, 2026
@pkg-pr-new

pkg-pr-new Bot commented Jun 30, 2026

Copy link
Copy Markdown

Open in StackBlitz

npm i https://pkg.pr.new/@typespec/http-client-python@11122

commit: 9925d65

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

All changed packages have been documented.

  • @typespec/http-client-python
Show changes

@typespec/http-client-python - internal ✏️

Add a diff-spector-tests script wired to the new language-agnostic emitter-diff tool, and honor --generatedFolder in the batch Python phase of regenerate.ts

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Python emitter diff

Baseline fab48a2d2c8394b812ed538b520fcb6a4ba9a8f9 vs this PR.

Diff summary: 400 file(s), +400 / -0

Full rendered diff: emitter-diff-html artifact on the run.

Informational check (eng/emitter-diff); does not block the PR.

@l0lawrence l0lawrence changed the title Add language-agnostic emitter-diff tool + python adapter (same-repo test) Add language-agnostic emitter-diff tool + python adapter Jun 30, 2026
l0lawrence and others added 2 commits June 30, 2026 14:53
A non-zero exit that is not the dedicated 'diff present' code (2) means the tool
failed to run (build/venv/generate threw), not that output is unchanged. Drive
the comment off the diff step's exit status so a hard error shows a clear
failure notice instead of a misleading 'no changes'.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Revert the approved baseline to the blessed HEAD SHA and add a one-line comment
to the generated package __init__.py so the PR's generated output differs from
the baseline, exercising the emitter-diff PR comment + approval gate end to end.
Both changes are temporary and must be reverted before merge.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@azure-sdk-automation

azure-sdk-automation Bot commented Jun 30, 2026

Copy link
Copy Markdown

You can try these changes here

🛝 Playground 🌐 Website 🛝 VSCode Extension

l0lawrence and others added 12 commits June 30, 2026 15:10
pnpm's recursive 'pnpm --filter <pkg> exec' collapses any non-zero child exit
into its own generic exit 1 (ERR_PNPM_RECURSIVE_EXEC_FIRST_FAIL), which masked
the tool's dedicated 'diff present' code (2) as a hard error (1). Run the CLI
from the package directory via plain (non-recursive) 'pnpm exec' so the real
exit code propagates and the approval gate can distinguish a diff from a crash.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
…ocs)

- resolver: on the full-fetch fallback, check out a github ref by name rather
  than FETCH_HEAD (which could be a branch head), so a SHA baseline/head never
  silently diffs the wrong commit.
- resolver: gh:<ref> now resolves the origin repo from the local remote instead
  of hard-coding microsoft/typespec, so forks/renames work.
- python adapter: recognize packages/http-specs and packages/azure-http-specs
  layouts for external --specs, and warn (instead of silently falling back to
  the current checkout) when an explicit --specs dir can't be resolved.
- diff: --html always writes a file (a 'No differences' stub when unchanged) and
  inlines the diff2html stylesheet so the CI artifact renders offline.
- cli: reject an invalid --test-target instead of silently running no suites.
- docs: fix the broken github-sha example, note --specs doesn't accept npm refs,
  soften the 'no core changes' wording, and correct the stale workflow comment.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
- cli.ts: absolutize --work-dir (resolve, not bare join). The python adapter
  runs regenerate.ts with a different cwd, so a relative work-dir made
  --generatedFolder/--pluginDir resolve against the wrong directory: outputs
  landed elsewhere while the diff engine compared empty dirs -> silent false
  'no differences'. Now every path handed to an adapter is absolute.
- cli.ts: reject --specs npm: before building emitters (fast-fail).
- python.ts: ensureDeps() runs 'npm ci' when node_modules is missing so a
  fresh gh:/github: clone builds instead of failing with 'cannot find module'.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
The sticky comment now shows only the one-line summary (file count, +/-) and
the emitter-diff-html artifact link, not an enumerated list of changed files.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
…ent)

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Security:
- Pass the untrusted workflow_dispatch baseline input (and derived values) via
  env vars instead of interpolating GitHub Actions expressions into run: bash
  (Actions expression injection)
- Validate git ref/repo and use "git checkout --end-of-options" (arg injection)
- Add "npm install --ignore-scripts" when materializing a baseline package
- Drop the remote CDN CSS fallback so the HTML report never fetches off-box

Cleanup:
- De-duplicate the two HTML report templates into one htmlDoc() helper
- Remove emojis from CLI output, job summary, and PR comment

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
The baseline is now the git merge-base with the PR base branch instead of
a pinned SHA file, so it survives squash-merge / rebase / force-push. Diff
approval is granted by adding the `emitter-diff-approved` label to the PR;
labeled/unlabeled events re-run the check. Removes eng/emitter-diff/baselines/python.sha.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
The workflow no longer fails on a generated-output diff or gates on a label.
It runs the tool without --fail-on-diff, so a diff exits 0 and only a real
tool/build error fails the job. Diffs are reported via job summary, a sticky
PR comment, and the HTML artifact. Drops the label trigger, the approval gate,
and the emitter-diff-approved label requirement.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
- Replace diff2html with a self-contained, GitHub-style HTML renderer
  (inline CSS, light/dark, no external requests). Runtime deps now empty.
- Remove diff2html from package.json, pnpm catalog, and lockfile.
- Drop the --run-tests/--test-env/--test-target surface and the adapter
  runTests contract; parallel generation and --sequential are unchanged.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Now that the tool has zero runtime dependencies, it no longer needs to be
a workspace package. Deleting the package.json and its name removes any
npm dependency-confusion surface, and the tool runs with plain
ode
(Node 24 strips TypeScript natively) instead of tsx.

- Delete eng/emitter-diff/package.json; drop it from pnpm-workspace.yaml.
- Rewrite relative imports to .ts extensions and set
  allowImportingTsExtensions so
ode runs it and 	sc typechecks.
- Invoke via
ode src/cli.ts in CI, the shebang, and the README.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Running emitter-diff with plain node requires .ts import specifiers, but
the repo-wide check:eng gate (tsconfig.eng.json, --noEmit) rejected them.
Enable allowImportingTsExtensions there so the tool stays covered by the
existing typecheck. Also reword the ensureDeps comment: http-client-python
commits its own package-lock.json to the repo (lockfiles aren't in the
published tarball), which is why npm ci works on a source checkout.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
A local run now prints the exact generated trees the diff compares, so
developers can open the head (current checkout) output directly instead of
inferring it from the work dir. Language-agnostic: lives in cli.ts, labeled
per side by the resolved emitter.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
@l0lawrence

Copy link
Copy Markdown
Member Author

should emitter-diff live under eng/common bc we need this in tsp-azure too (like for rust/js/go etc)

Cut the python workflow to ~half its size:
- trim explanatory comment headers
- drop the hand-rolled github-script patch parser; reuse the tool's own
  'Diff summary:' line (ANSI-stripped) as a step output for the PR comment
- drop the redundant job-summary step and the now-unused --patch output
- collapse 'Fail on tool error' to a one-liner

Behavior is unchanged: sticky PR comment, HTML artifact, informational-only
(fails only on a tool/build error, never on a diff).

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
- External `--specs` folders must mirror the `http-specs` / `azure-http-specs` layout.
- `--html` renders a self-contained, GitHub-style HTML report (inline CSS, no external requests).
The tool has **zero runtime dependencies** — the diff itself is `git diff --no-index`.
- **Python's native two-phase pipeline needs a venv per emitter version.** `regenerate.ts`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could also use worktree for local diff and there are 2 pros:

  1. we don't need care about venv since worktree is in independent worktree
  2. we could cache commit SHA for worktree then there is no need to regenerate for baseline after running once.

Comment thread eng/emitter-diff/README.md Outdated

```bash
# Run directly with Node 24+ (native TypeScript, no build step, no dependencies):
node eng/emitter-diff/src/cli.ts --emitter python --baseline 0.34.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cli.ts shall have an option like --generated-code-path so that language emitter could set specific folder.

${pc.cyan("-j, --jobs <n>")}
Number of parallel compilation tasks (default: 30 on Linux/Mac, 10 on Windows).

${pc.cyan("--httpSpecsDir <dir>")}

@msyyc msyyc Jul 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect there shall be no change for regenerate.ts.

The ideal usage is: all language emitter just need to add a new command like spector-code-diff: node eng/emitter-diff/src/cli.ts --command "npm run regenerate" --emitter python --emitter-path packages/http-client-python --generated-spector-code-path XXX --baseline github:microsoft/typespec@ ...

In cli.ts, it will step into emitter path and run command to regenerate then extract from generated code of --generated-spector-code-path.

In one word, main work shall be in emitter-diff script and language emitters just need to add a new command to call the emitter-diff script to provide necessary command.

Comment thread eng/emitter-diff/README.md Outdated
Comment on lines +58 to +67
| Option | Description |
| ------------------ | ------------------------------------------------------------------------------------- |
| `--name <pattern>` | Filter which specs/packages are generated |
| `--html <file>` | Write the rendered HTML report to this path (default: `<work-dir>/emitter-diff.html`) |
| `--terminal` | Print the full colored patch to the terminal instead |
| `--patch <file>` | Write the raw unified diff to a file |
| `--fail-on-diff` | Exit non-zero when output differs (CI gating) |
| `--opt key=value` | Repeatable adapter-specific option (e.g. `--opt flavor=azure`) |
| `--sequential` | Generate baseline then head one at a time (default: both in parallel) |
| `-- <args>` | Everything after `--` is forwarded to the adapter |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first version, I think there is no need to add too many customized options. Just call the regenerate command of language emitters then show the diff. And we could add more command option in the future if needed.

l0lawrence and others added 5 commits July 2, 2026 15:25
…t diff configs can run script directly in terminal) -- need to check on cache implementation locally
…r fix

Spec versions rarely change, so pinning specs across baseline/head is
unnecessary. Revert resolveSpecsDir + spec existence warnings back to
main (hardcoded PLUGIN_DIR node_modules specs). Keep the batch-Python
change so run_batch.py honors --generatedFolder, which the emitter-diff
tool relies on to isolate baseline/head output trees.

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
- resolver.ts: attach `cause` to re-thrown manifest parse error (oxlint preserve-caught-error)
- README: replace nonsense `libba` placeholder with a real example
- cspell: allow `Menlo` (font name in the HTML report CSS)
- add chronus changeset for @typespec/http-client-python

Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

emitter:client:python Issue for the Python client emitter: @typespec/http-client-python eng

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants