Skip to content

changelog: make generated entry YAMLs self-describing for the scrubber#3417

Merged
cotti merged 1 commit into
mainfrom
fix/changelog-self-describing-refs
May 28, 2026
Merged

changelog: make generated entry YAMLs self-describing for the scrubber#3417
cotti merged 1 commit into
mainfrom
fix/changelog-self-describing-refs

Conversation

@cotti
Copy link
Copy Markdown
Contributor

@cotti cotti commented May 28, 2026

Summary

The changelog-upload Lambda (docs-lambda-changelog-scrubber) reads each {product}/changelogs/*.yaml from the private S3 bucket and applies the allowlist sanitizer before mirroring to the public bucket. Because each per-entry YAML carries no embedded repo context, it calls LinkAllowlistSanitizer.TryApplyChangelogEntry(..., defaultOwner: \"elastic\", defaultRepo: null, ...).

For an entry produced by the upload action's fork-PR re-derivation step —

docs-builder changelog add --concise --use-pr-number \
  --owner elastic --repo cloud --prs 155500

BuildChangelogData stored the bare PR number verbatim in the entry's Prs list, so the written YAML contained prs: ['155500']. When the Lambda later tried to scrub it, ChangelogTextUtilities.TryGetGitHubRepo couldn't resolve a repo for the bare number (no defaultRepo), the sanitizer emitted an error and aborted, and SQS retried the message indefinitely with the same outcome:

System.InvalidOperationException: Failed to apply allowlist to changelog entry; errors: 1

(Observed today in CloudWatch — an entry never reached the public CDN bucket.)

Changes

Two fixes here, with tests:

  1. ChangelogFileWriter.NormalizeReferences — when --owner and --repo are supplied and a --prs / --issues value is just a number, expand it to https://github.com/{owner}/{repo}/{pull|issues}/N before writing the YAML. Bundle-style multi-repo strings (a+b) and pre-qualified org/repo values are left alone to avoid producing wrong URLs. Full URLs and owner/repo#N short-forms pass through unchanged. The validator already requires --owner and --repo when --prs is bare, so this is a strict superset of accepted input.

  2. LinkAllowlistSanitizer.FilterReferenceList (defense in depth) — if we encounter a bare numeric reference and defaultRepo is unknown, keep the reference as-is and emit a warning instead of failing the entry. A bare number carries no repo identity, so it cannot leak a private link on its own, and downstream rendering supplies owner/repo from runtime context. Genuinely unparseable references (e.g. not-a-pr-ref) still hit the original fail-closed error branch so schema regressions remain visible.

Together these unblock the already-uploaded entries that the Lambda is currently dead-lettering, and prevent the same shape of bad YAML from ever being produced again.

Test plan

  • New NormalizeReferencesTests (10 cases) cover bare-number expansion, missing-context fall-through, full URL pass-through, short-form pass-through, bundle-style/pre-qualified repo guards, whitespace trimming, and mixed lists.
  • New TryApplyChangelogEntry_BarePrNumberWithoutDefaultRepo_KeptWithWarning / _BareIssueNumberWithoutDefaultRepo_KeptWithWarning / _UnparseableNonNumericRef_StillErrors cases in LinkAllowlistSanitizerTests.
  • New PrIntegrationTests.CreateChangelog_WithBareNumberPrAndOwnerRepo_WritesFullUrlIntoYaml mirrors the exact upload-action invocation.
  • Full Elastic.Changelog.Tests suite: 689 passing.
  • dotnet format, ./build.sh build --skip-dirty-check, dotnet publish src/tooling/docs-builder -c Release (AOT) all clean.
  • Regenerated docs/cli-schema.json — no diff (CLI surface unchanged).

Roll-out

  • After merge, the next docs-builder release ships the updated Lambda.
  • All new entries produced by docs-builder changelog add --owner X --repo Y --prs N will be self-describing immediately, so even an old Lambda would scrub them correctly.

Made with Cursor

The changelog-upload Lambda (`docs-lambda-changelog-scrubber`) reads each
`{product}/changelogs/*.yaml` from the private S3 bucket and applies the
allowlist sanitizer before mirroring to the public bucket. Because each
per-entry YAML carries no embedded repo context, it calls
`LinkAllowlistSanitizer.TryApplyChangelogEntry(..., defaultOwner: "elastic",
defaultRepo: null, ...)`.

For an entry produced by the upload action's fork-PR re-derivation step —

    docs-builder changelog add --concise --use-pr-number \
      --owner elastic --repo cloud --prs 155500

— `BuildChangelogData` stored the bare PR number verbatim in the entry's
`Prs` list, so the written YAML contained `prs: ['155500']`. When the
Lambda later tried to scrub it, `ChangelogTextUtilities.TryGetGitHubRepo`
couldn't resolve a repo for the bare number (no `defaultRepo`), the
sanitizer emitted an error and aborted, and SQS retried the message
indefinitely with the same outcome:

    System.InvalidOperationException: Failed to apply allowlist to
    changelog entry; errors: 1

Two fixes here, with tests:

1. `ChangelogFileWriter.NormalizeReferences` — when `--owner` and `--repo`
   are supplied and a `--prs` / `--issues` value is just a number, expand
   it to `https://github.com/{owner}/{repo}/{pull|issues}/N` before
   writing the YAML. Bundle-style multi-repo strings (`a+b`) and
   pre-qualified `org/repo` values are left alone to avoid producing
   wrong URLs. Full URLs and `owner/repo#N` short-forms pass through
   unchanged. The validator already requires `--owner` and `--repo` when
   `--prs` is bare, so this is a strict superset of accepted input.

2. `LinkAllowlistSanitizer.FilterReferenceList` (defense in depth) — if
   we encounter a bare numeric reference and `defaultRepo` is unknown,
   keep the reference as-is and emit a warning instead of failing the
   entry. A bare number carries no repo identity, so it cannot leak a
   private link on its own, and downstream rendering supplies owner/repo
   from runtime context. Genuinely unparseable references (e.g.
   `not-a-pr-ref`) still hit the original fail-closed error branch so
   schema regressions remain visible.

Together these unblock the already-uploaded entries that the Lambda is
currently dead-lettering, and prevent the same shape of bad YAML from
ever being produced again.

Verified `dotnet format`, `./build.sh build --skip-dirty-check`, AOT
publish (`dotnet publish src/tooling/docs-builder -c Release`), the full
`Elastic.Changelog.Tests` suite (689 passing), and `docs/cli-schema.json`
regeneration shows no change (the CLI surface is unchanged).

Refs: elastic/cloud changelog-upload SQS DLQ for
`cloud-hosted/changelogs/155500.yaml`

Co-authored-by: Cursor <cursoragent@cursor.com>
@cotti cotti requested a review from a team as a code owner May 28, 2026 16:40
@cotti cotti requested a review from Mpdreamz May 28, 2026 16:40
@cotti cotti temporarily deployed to integration-tests May 28, 2026 16:40 — with GitHub Actions Inactive
@cotti cotti self-assigned this May 28, 2026
@cotti cotti added the fix label May 28, 2026
@cotti cotti enabled auto-merge (squash) May 28, 2026 16:43
@cotti cotti merged commit 947c578 into main May 28, 2026
23 of 25 checks passed
@cotti cotti deleted the fix/changelog-self-describing-refs branch May 28, 2026 16:50
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9733e8d-811e-4aaf-bcc0-8f5c0b4c124a

📥 Commits

Reviewing files that changed from the base of the PR and between 9892501 and 738b5ea.

📒 Files selected for processing (5)
  • src/services/Elastic.Changelog/Bundling/LinkAllowlistSanitizer.cs
  • src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs
  • tests/Elastic.Changelog.Tests/Changelogs/Create/PrIntegrationTests.cs
  • tests/Elastic.Changelog.Tests/Changelogs/LinkAllowlistSanitizerTests.cs
  • tests/Elastic.Changelog.Tests/Creation/NormalizeReferencesTests.cs

📝 Walkthrough

Walkthrough

This PR adds support for expanding bare numeric pull request and issue references into full GitHub URLs when owner and repo context is available. The core NormalizeReferences helper in ChangelogFileWriter detects whitespace-trimmed numeric inputs and expands them to https://github.com/{owner}/{repo}/{pull|issues}/{N} only when owner/repo are present and repo is a single-repo context. The LinkAllowlistSanitizer now treats bare numeric references as safely recoverable when it lacks per-entry repo context, emitting a warning instead of failing. New unit tests cover the normalization logic and sanitizer recovery path, and an integration test validates the full flow through changelog creation.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The linked issue #2 is a meta tracking issue for directive support completeness and is completely unrelated to the changelog YAML self-describing and PR/issue reference expansion changes in this PR. Remove the unrelated linked issue #2 from this PR, or link to the correct issue tracking the changelog Lambda failure that this PR addresses.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: making generated changelog entry YAMLs self-describing by expanding bare numeric references into full GitHub URLs.
Description check ✅ Passed The description thoroughly explains the problem (bare PR numbers causing Lambda failures), the two fixes implemented, test coverage, and rollout plan—all directly related to the changeset.
Out of Scope Changes check ✅ Passed All changes (LinkAllowlistSanitizer recovery path, ChangelogFileWriter.NormalizeReferences expansion, and corresponding tests) are directly scoped to fixing the bare numeric PR/issue reference problem described in the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/changelog-self-describing-refs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants