Skip to content

t1118: Add AI self-reflection context — supervisor can now diagnose its own failures#1671

Merged
marcusquinn merged 2 commits intomainfrom
feature/t1118-ai-self-reflection
Feb 18, 2026
Merged

t1118: Add AI self-reflection context — supervisor can now diagnose its own failures#1671
marcusquinn merged 2 commits intomainfrom
feature/t1118-ai-self-reflection

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 18, 2026

Summary

  • Adds Section 10 (build_self_reflection_context) to ai-context.sh — feeds the AI its own action execution history so it can identify and fix recurring failures
  • Adds analysis area chore: optimize MCP token usage (43% reduction) #9 (Self-reflection) to the reasoning prompt in ai-reason.sh
  • Adds adjust_priority example to the output format (was the only action type without an example)

Problem

The supervisor AI couldn't self-diagnose because it never saw its own execution results:

Solution

The self-reflection section provides:

Data What the AI learns
Execution summary (31 executed, 16 skipped = 65% rate) Overall effectiveness
Recurring skip reasons with counts Prompt/validator mismatches to fix
Action repetition detection Stop acting on same targets repeatedly
Pipeline errors from log Infrastructure issues to create fix tasks for

Testing

Tested standalone — produces clean output from real action logs:

### Recurring Skip Reasons
| Count | Action Type | Reason |
|-------|-------------|--------|
| 10 | adjust_priority | missing required field: new_priority |
| 6 | create_improvement | invalid type |

Context size: 20KB (well within 50K budget)

Closes #1670

Summary by CodeRabbit

  • New Features

    • AI system now includes self-reflection capabilities to evaluate its own performance, action patterns, and identify redundancies.
    • AI can propose task priority adjustments based on self-assessment analysis.
    • Enhanced tracking of execution metrics including success/failure rates and skip reasons.
  • Documentation

    • Updated model performance leaderboard with expanded test data and new task categories (architecture, research, content).

…oning engine (t1118)

The supervisor AI couldn't diagnose its own failures because it never saw
its action execution results. This adds:

1. Section 10 in ai-context.sh (build_self_reflection_context):
   - Execution summary across last 5 cycles (executed/failed/skipped rates)
   - Recurring skip reasons with counts (reveals prompt/validator mismatches)
   - Recurring failures with counts
   - Action repetition detection (same targets across cycles)
   - Pipeline errors from ai-supervisor.log

2. Analysis area #9 in reasoning prompt (ai-reason.sh):
   - Instructs AI to review self-reflection data and create improvement tasks
     for its own recurring failures

3. Added adjust_priority example to output format:
   - The AI kept omitting new_priority because no example showed the field
   - This was the #1 skip reason (10 skips across 5 cycles)

The AI can now self-diagnose issues like:
- 'adjust_priority skipped 10x: missing new_priority' -> fix own output
- 'create_improvement skipped 6x: invalid type' -> flag deployment gap
- 'issue #1601 acted on 3x across 5 cycles' -> stop repeating
- 'jq compile errors in pipeline' -> create fix task
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces AI self-reflection capabilities (t1118) that feed execution history back to the reasoning engine. A new build_self_reflection_context() function analyzes supervisor logs to aggregate action metrics, identify skip patterns and failures, surface redundancies, and format findings as Markdown for integration into the AI context.

Changes

Cohort / File(s) Summary
AI Self-Reflection Context
.agents/scripts/supervisor/ai-context.sh, .agents/scripts/supervisor/ai-reason.sh
Added build_self_reflection_context() function to scan supervisor logs (last 5 cycles), compute execution/failure/skip rates, extract and tabulate recurring skip reasons and failure types, identify repeated actions by issue and task ID, include pipeline errors, and format as Markdown Section 10. Updated reasoning prompts to include Self-reflection as item 9 in Analysis Framework with example adjust_priority action output.
Performance Metrics
MODELS.md
Updated metadata timestamp, expanded pattern data points from 865 to 876, refreshed performance leaderboard statistics for opus and sonnet models, and added new task type categories (architecture, research, content) with corresponding success/failure metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • #1607: Extends the AI supervisor context builder in the same file with the new build_self_reflection_context() function and integration into build_ai_context().

Poem

🔄 The AI pauses, reflects with care,
Logs of actions float through the air,
Patterns emerge from cycles past,
"Why did I stumble? How can I last?"
With wisdom gained from failure's embrace,
It charts a smarter, swifter pace.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1118-ai-self-reflection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@marcusquinn marcusquinn merged commit df62186 into main Feb 18, 2026
7 of 10 checks passed
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AI supervisor's autonomy and effectiveness by enabling it to self-diagnose and learn from its past actions. By integrating a new self-reflection mechanism, the AI now receives detailed feedback on its execution history, including skipped actions, recurring failures, repeated tasks, and pipeline errors. This newfound awareness allows the AI to proactively identify and address issues in its prompts, execution logic, and overall strategy, leading to a more robust and continuously improving system.

Highlights

  • AI Self-Reflection Context: Introduced a new 'AI Self-Reflection' context section (Section 10) in ai-context.sh to feed the AI its own action execution history, enabling it to identify and fix recurring failures.
  • Self-reflection Analysis Area: Added 'Self-reflection' as a new analysis area (chore: optimize MCP token usage (43% reduction) #9) to the AI's reasoning prompt in ai-reason.sh, guiding the AI to review its own performance and identify areas for improvement.
  • Adjust Priority Action Example: Provided an example for the adjust_priority action type in the ai-reason.sh output format, addressing a previous gap where this action lacked an example.
Changelog
  • .agents/scripts/supervisor/ai-context.sh
    • Added a conditional call to build_self_reflection_context within build_ai_context when the scope is 'full'.
    • Implemented the build_self_reflection_context function, which aggregates and formats data from recent action logs, including execution summaries, recurring skip reasons, failed actions, repeated action targets, and recent pipeline errors.
  • .agents/scripts/supervisor/ai-reason.sh
    • Appended a new analysis area, 'Self-reflection' (Section 9), to the AI's reasoning prompt, instructing it to review its own execution history for improvement.
    • Inserted an example JSON object for the adjust_priority action type into the 'Output Format' section.
  • MODELS.md
    • Updated the Last updated timestamp.
    • Incremented the Pattern data points count from 865 to 876.
    • Adjusted opus model statistics (Tasks from 497 to 502, Successes from 492 to 496, Failures from 5 to 6).
    • Adjusted sonnet model statistics (Tasks from 154 to 160, Successes from 154 to 160).
    • Updated feature task type statistics (Tasks from 524 to 533, Successes from 508 to 516, Failures from 16 to 17).
    • Updated bugfix task type statistics (Tasks from 8 to 10, Successes from 5 to 7).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 24 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 18 15:44:47 UTC 2026: Code review monitoring started
Wed Feb 18 15:44:48 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 24

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 24
  • VULNERABILITIES: 0

Generated on: Wed Feb 18 15:44:50 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t1118: Add AI self-reflection context — feed execution history back to reasoning engine

1 participant