Skip to content

Conversation

@BYK
Copy link
Member

@BYK BYK commented Dec 29, 2025

Summary

Adds AI-powered summarization for changelog sections. Uses GitHub Models API by default, with a local fallback when no token is available.

Features

  • Neutral, factual summaries: Avoids promotional language ("enhanced", "improved") - just states what changed
  • Expandable details: Original items are preserved in a collapsible <details> block
  • Section Summaries: Condenses verbose bullet lists into readable prose (58-64% compression)
  • Top-Level Summary: Optional executive summary paragraph for entire changelog
  • GitHub Models API: Uses GPT-4o-mini by default
  • Local Fallback: Uses Falconsai/text_summarization (~60MB) when no token available
  • Zero Config: Works with existing GITHUB_TOKEN or gh CLI
  • Threshold-based: Only summarizes sections/releases exceeding the threshold

Examples

Example 1: Craft 2.16.0 (Small Release)

Input (6 items):

### New Features
- Strip commit patterns from changelog entries
- Add support for custom changelog entries from PR descriptions
- Support for multiple entries and nested items
- Add changelog preview action and CLI command
- Make release workflow reusable for external repos
- Add version templating for layer names

Output (with expandable details):

### New Features
Changelog entries now support custom descriptions, multiple items, previews, reusable workflows, and version templating for layers.

<details>
<summary>Show 6 items</summary>

- Strip commit patterns from changelog entries
- Add support for custom changelog entries from PR descriptions
- Support for multiple entries and nested items
- Add changelog preview action and CLI command
- Make release workflow reusable for external repos
- Add version templating for layer names

</details>

Top-Level Summary (with topLevel: "always"):

"The software release includes several new features: the ability to strip commit patterns from changelog entries, support for custom changelog entries derived from pull request descriptions, and support for multiple entries and nested items. Additionally, a changelog preview action and CLI command have been added."

Example 2: Sentry 25.12.0 (Large Release)

Real-world test with Sentry 25.12.0 (31 items across 3 sections):

Section Summaries

Section Items Words In → Out Compression
ACI 11 98 → 41 58%
Agents 8 58 → 24 59%
Seer & Triage 12 80 → 31 61%

ACI Section (neutral tone):

"The metric monitor form now defaults to errors, alerts have a disabled option, test notification errors are displayed, navigation improvements were made, and new features include issue details, detector configurations, and direct log sending to Sentry."

Agents Section:

"Added markdown rendering with raw value switching, error icon preservation, browser JS onboarding, relocated analytics events, Seer feature tracking, anomaly thresholds for metric monitors, parallelized stats queries, and restored SPA auth page."

Top-Level Summary (106 words)

"The latest software release includes several updates across three main areas: ACI, Agents, and Seer & Triage. In the ACI section, the metric monitor form now defaults to the number of errors, and alerts have been updated to include a disabled status and display test notification errors in the UI. The Agents section introduces markdown rendering, the ability to switch to raw values, and a new onboarding process for browser JavaScript. Additionally, the Seer & Triage updates involve changes to support repo type checks, column renaming for broader applicability, and the removal of unnecessary calls."

Sections with ≤5 items are left unchanged.

Configuration

aiSummaries:
  enabled: true
  kickInThreshold: 5  # Only summarize sections with >5 items
  model: "openai/gpt-4o-mini"  # default
  topLevel: "threshold"  # "always" | "never" | "threshold" | true | false

Top-Level Summary Options

Value Behavior
"always" or true Always generate a top-level summary paragraph
"never" or false Never generate a top-level summary
"threshold" (default) Only generate if total items > kickInThreshold

Available Models

GitHub Models (requires GITHUB_TOKEN):

model: "openai/gpt-4o-mini"        # Default
model: "openai/gpt-4o"             # Most capable
model: "mistral-ai/ministral-3b"   # More aggressive compression

Local (no token needed):

model: "local:Falconsai/text_summarization"  # 60MB, extractive

Details Block

When AI summarization is applied, the original items are preserved in an expandable <details> block:

Summary text here.

<details>
<summary>Show 6 items</summary>

- Original item 1
- Original item 2
- ...

</details>

This allows users to expand and see the full details when needed.

Authentication

Uses your GitHub token automatically:

  • From GITHUB_TOKEN environment variable, or
  • From gh auth token (GitHub CLI)

Falls back to local model if no token available.

Files Changed

File Description
src/utils/ai-summary.ts Dual-mode summarization with neutral prompts, details formatting
src/__tests__/ai-summary.test.ts Unit tests (40 tests)
src/__tests__/ai-summary.integration.test.ts Integration tests (16 tests with Sentry 25.12.0 data)
src/__tests__/ai-summary.eval.ts Quality evals with vitest-evals
src/schemas/projectConfig.schema.ts Updated config schema with topLevel
README.md Documentation with before/after examples

Commands

yarn test                         # Unit tests (706 tests)
GITHUB_TOKEN=... yarn test:evals  # AI quality evals

Model Selection Journey

We tested various local models before settling on the current hybrid approach:

Model Size Quality Issue
SmolLM2-360M 360MB Hallucinations, off-topic filler
SmolLM2-1.7B 1.7GB Refused to summarize
Qwen2-0.5B 500MB ⚠️ Concatenation, not summarization
Qwen2.5-1.5B 1GB Requires cmake to compile
Flan-T5-Large 1.5GB ONNX parsing errors
Falconsai/text_summarization 60MB Works well for extractive
GitHub Models API Best abstractive quality

Conclusion: Small local LLMs (<2GB) struggle with true abstractive summarization. GitHub Models API provides superior quality; local model serves as a reasonable fallback.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 29, 2025

Semver Impact of This PR

🟡 Minor (new features)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (changelog) Add AI-powered summaries for verbose changelog sections by BYK in #688

Bug Fixes 🐛

  • (changelog) Disable author mentions in PR preview comments by BYK in #684
  • (github) Clean up orphaned draft releases on publish failure by BYK in #681
  • (publish) Fail early on dirty git repository by BYK in #683

🤖 This preview updates automatically when you update the PR.

@BYK BYK force-pushed the byk/feat/changelog-ai-summary branch 3 times, most recently from 54efab4 to 2d363db Compare December 29, 2025 21:20
Add optional AI-powered summarization for changelog sections using GitHub
Models API. Uses your existing GitHub token—no additional API keys required.

Features:
- Summarizes sections with >5 items into concise prose (40-60% compression)
- Uses GPT-4o-mini by default via GitHub Models API
- Configurable model selection (GPT-4o, Llama, etc.)
- Graceful degradation if token unavailable
- Eval tests using vitest-evals for quality validation

Configuration:
  aiSummaries:
    enabled: true
    kickInThreshold: 5
    model: openai/gpt-4o-mini
@BYK BYK force-pushed the byk/feat/changelog-ai-summary branch from 2d363db to 0a7d479 Compare December 29, 2025 21:59
BYK added 9 commits December 30, 2025 02:12
- Change default model to mistral-ai/ministral-3b (71-87% compression)
- Add local fallback using Falconsai/text_summarization (~60MB)
- Fallback activates when no GITHUB_TOKEN available
- Support local: prefix for explicit local model selection
- Update README with before/after example and model options
- Update tests to cover both API and local paths (20 tests)
- Move manual test script to proper Vitest integration test
- Tests real changelog sections from Sentry 25.12.0 release
- Validates compression ratio and threshold behavior
- Skips automatically if GITHUB_TOKEN not available
GPT-4o-mini produces higher quality summaries with better
readability compared to Ministral-3b.
Adds topLevel config option to control executive summary generation:
- 'always' or true: Always generate top-level summary
- 'never' or false: Never generate top-level summary
- 'threshold' (default): Only generate if total items > kickInThreshold

The top-level summary creates a single paragraph (up to 5 sentences)
summarizing the entire release, ideal for large releases.

Also adds summarizeChangelog() and shouldGenerateTopLevel() functions
with full test coverage (34 tests).
- Add tests for summarizeChangelog with Craft 2.16.0 and Sentry 25.12.0
- Add tests for shouldGenerateTopLevel with all mode combinations
- Update README with both section and top-level summary examples
- Total: 16 integration tests, 34 unit tests (700 tests overall)
- Update prompts to avoid promotional language (no 'enhanced', 'improved', etc.)
- Add formatSummaryWithDetails() for wrapping original items in <details>
- Add 6 new unit tests for formatSummaryWithDetails
- Update README with neutral tone examples and details block demo
- Total: 40 unit tests, 16 integration tests (706 tests overall)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants