t1118: Add AI self-reflection context — supervisor can now diagnose its own failures by marcusquinn · Pull Request #1671 · marcusquinn/aidevops

marcusquinn · 2026-02-18T15:44:21Z

Summary

Adds Section 10 (build_self_reflection_context) to ai-context.sh — feeds the AI its own action execution history so it can identify and fix recurring failures
Adds analysis area chore: optimize MCP token usage (43% reduction) #9 (Self-reflection) to the reasoning prompt in ai-reason.sh
Adds adjust_priority example to the output format (was the only action type without an example)

Problem

The supervisor AI couldn't self-diagnose because it never saw its own execution results:

adjust_priority was skipped 10 times across 5 cycles because the AI didn't include new_priority (no example in the prompt)
create_improvement was skipped 6 times as "invalid type" (deployment gap — the AI would have flagged this)
The same issues (t1085.2: AI Supervisor reasoning engine #1601, t1079: Add set -euo pipefail error handling to scripts missing it #1572) were flagged 3 times each across cycles (redundant work)
Pipeline jq errors went undiagnosed for multiple cycles

Solution

The self-reflection section provides:

Data	What the AI learns
Execution summary (31 executed, 16 skipped = 65% rate)	Overall effectiveness
Recurring skip reasons with counts	Prompt/validator mismatches to fix
Action repetition detection	Stop acting on same targets repeatedly
Pipeline errors from log	Infrastructure issues to create fix tasks for

Testing

Tested standalone — produces clean output from real action logs:

### Recurring Skip Reasons
| Count | Action Type | Reason |
|-------|-------------|--------|
| 10 | adjust_priority | missing required field: new_priority |
| 6 | create_improvement | invalid type |

Context size: 20KB (well within 50K budget)

Closes #1670

Summary by CodeRabbit

New Features
- AI system now includes self-reflection capabilities to evaluate its own performance, action patterns, and identify redundancies.
- AI can propose task priority adjustments based on self-assessment analysis.
- Enhanced tracking of execution metrics including success/failure rates and skip reasons.
Documentation
- Updated model performance leaderboard with expanded test data and new task categories (architecture, research, content).

…oning engine (t1118) The supervisor AI couldn't diagnose its own failures because it never saw its action execution results. This adds: 1. Section 10 in ai-context.sh (build_self_reflection_context): - Execution summary across last 5 cycles (executed/failed/skipped rates) - Recurring skip reasons with counts (reveals prompt/validator mismatches) - Recurring failures with counts - Action repetition detection (same targets across cycles) - Pipeline errors from ai-supervisor.log 2. Analysis area #9 in reasoning prompt (ai-reason.sh): - Instructs AI to review self-reflection data and create improvement tasks for its own recurring failures 3. Added adjust_priority example to output format: - The AI kept omitting new_priority because no example showed the field - This was the #1 skip reason (10 skips across 5 cycles) The AI can now self-diagnose issues like: - 'adjust_priority skipped 10x: missing new_priority' -> fix own output - 'create_improvement skipped 6x: invalid type' -> flag deployment gap - 'issue #1601 acted on 3x across 5 cycles' -> stop repeating - 'jq compile errors in pipeline' -> create fix task

coderabbitai · 2026-02-18T15:44:32Z

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces AI self-reflection capabilities (t1118) that feed execution history back to the reasoning engine. A new build_self_reflection_context() function analyzes supervisor logs to aggregate action metrics, identify skip patterns and failures, surface redundancies, and format findings as Markdown for integration into the AI context.

Changes

Cohort / File(s)	Summary
AI Self-Reflection Context `.agents/scripts/supervisor/ai-context.sh`, `.agents/scripts/supervisor/ai-reason.sh`	Added `build_self_reflection_context()` function to scan supervisor logs (last 5 cycles), compute execution/failure/skip rates, extract and tabulate recurring skip reasons and failure types, identify repeated actions by issue and task ID, include pipeline errors, and format as Markdown Section 10. Updated reasoning prompts to include Self-reflection as item 9 in Analysis Framework with example `adjust_priority` action output.
Performance Metrics `MODELS.md`	Updated metadata timestamp, expanded pattern data points from 865 to 876, refreshed performance leaderboard statistics for opus and sonnet models, and added new task type categories (architecture, research, content) with corresponding success/failure metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

#1607: Extends the AI supervisor context builder in the same file with the new build_self_reflection_context() function and integration into build_ai_context().

Poem

🔄 The AI pauses, reflects with care,
Logs of actions float through the air,
Patterns emerge from cycles past,
"Why did I stumble? How can I last?"
With wisdom gained from failure's embrace,
It charts a smarter, swifter pace.

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t1118-ai-self-reflection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-18T15:44:41Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AI supervisor's autonomy and effectiveness by enabling it to self-diagnose and learn from its past actions. By integrating a new self-reflection mechanism, the AI now receives detailed feedback on its execution history, including skipped actions, recurring failures, repeated tasks, and pipeline errors. This newfound awareness allows the AI to proactively identify and address issues in its prompts, execution logic, and overall strategy, leading to a more robust and continuously improving system.

Highlights

AI Self-Reflection Context: Introduced a new 'AI Self-Reflection' context section (Section 10) in ai-context.sh to feed the AI its own action execution history, enabling it to identify and fix recurring failures.
Self-reflection Analysis Area: Added 'Self-reflection' as a new analysis area (chore: optimize MCP token usage (43% reduction) #9) to the AI's reasoning prompt in ai-reason.sh, guiding the AI to review its own performance and identify areas for improvement.
Adjust Priority Action Example: Provided an example for the adjust_priority action type in the ai-reason.sh output format, addressing a previous gap where this action lacked an example.

Changelog

.agents/scripts/supervisor/ai-context.sh
- Added a conditional call to build_self_reflection_context within build_ai_context when the scope is 'full'.
- Implemented the build_self_reflection_context function, which aggregates and formats data from recent action logs, including execution summaries, recurring skip reasons, failed actions, repeated action targets, and recent pipeline errors.
.agents/scripts/supervisor/ai-reason.sh
- Appended a new analysis area, 'Self-reflection' (Section 9), to the AI's reasoning prompt, instructing it to review its own execution history for improvement.
- Inserted an example JSON object for the adjust_priority action type into the 'Output Format' section.
MODELS.md
- Updated the Last updated timestamp.
- Incremented the Pattern data points count from 865 to 876.
- Adjusted opus model statistics (Tasks from 497 to 502, Successes from 492 to 496, Failures from 5 to 6).
- Adjusted sonnet model statistics (Tasks from 154 to 160, Successes from 154 to 160).
- Updated feature task type statistics (Tasks from 524 to 533, Successes from 508 to 516, Failures from 16 to 17).
- Updated bugfix task type statistics (Tasks from 8 to 10, Successes from 5 to 7).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2026-02-18T15:44:42Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

github-actions · 2026-02-18T15:44:51Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 24 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 18 15:44:47 UTC 2026: Code review monitoring started
Wed Feb 18 15:44:48 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 24

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 24
VULNERABILITIES: 0

Generated on: Wed Feb 18 15:44:50 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-18T15:45:51Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

marcusquinn added 2 commits February 18, 2026 15:33

chore: regenerate MODELS.md leaderboard (t1012)

13d5c7d

marcusquinn merged commit df62186 into main Feb 18, 2026
7 of 10 checks passed

marcusquinn deleted the feature/t1118-ai-self-reflection branch February 21, 2026 01:59

coderabbitai bot mentioned this pull request Feb 25, 2026

t1336: Supervisor self-diagnosis — pipeline health, schema validation, issue tag drift #2276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1118: Add AI self-reflection context — supervisor can now diagnose its own failures#1671

t1118: Add AI self-reflection context — supervisor can now diagnose its own failures#1671
marcusquinn merged 2 commits intomainfrom
feature/t1118-ai-self-reflection

marcusquinn commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 18, 2026 •

edited

Loading

Review failed

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

Uh oh!

sonarqubecloud bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

marcusquinn commented Feb 18, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Feb 18, 2026

Uh oh!

github-actions bot commented Feb 18, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 18, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 18, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 18, 2026 •

edited

Loading