Skip to content

t1334: Diagnose t003.5 repeated worker deaths during cross-browser testing#2283

Merged
marcusquinn merged 1 commit intomainfrom
feature/t1334
Feb 25, 2026
Merged

t1334: Diagnose t003.5 repeated worker deaths during cross-browser testing#2283
marcusquinn merged 1 commit intomainfrom
feature/t1334

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 25, 2026

Summary

Root cause investigation for t003.5 (cross-browser and device testing) which failed 3+ consecutive times with worker_process_died_mid_task_pr_open_unmerged at both sonnet and opus tiers.

Root Cause

Single failing test: e2e/responsive.spec.ts:672 — "dashboard inset constrains content width on wide desktop"

The test measured the <main> element (which is SidebarInset — fills available space after sidebar) instead of the DashboardInset div inside it (which has max-w-7xl). On a 1920px viewport, <main> is wider than 1280px by design, so the assertion expect(mainBox.width).toBeLessThanOrEqual(1280) always failed.

Why workers died: Workers consumed their entire context windows trying to debug responsive CSS and test infrastructure when the fix was a 4-line test selector change. They never read the CI logs first to identify the exact failing assertion.

Fixes Applied

1. Test fix (a managed private repo repo, feature/t003.5 branch)

Changed the test to measure main.locator("> div").first() (DashboardInset) instead of main (SidebarInset).

2. Self-improvement (this repo)

Added "CI failure debugging" guidance to .agents/scripts/commands/full-loop.md worker protocol. Workers must now read CI logs via gh run view before attempting code changes on CI failure tasks.

Investigation Findings

Question Answer
Playwright/browser infra configured? Yes — all other E2E tests pass (168/169 on shards 1-2, 160/162 on shard 4)
Memory/time limits causing OOM? No — tests complete normally, just with assertion failures
Responsive fixes caused regressions? No — the CSS changes are correct; the test was wrong
PR #253 overlap? Partial — PR #253 fixed WebKit overflow and sidebar org context issues, but not this test

CI Status

Ref #2282

Summary by CodeRabbit

  • Documentation
    • Added comprehensive CI failure debugging guidance with step-by-step instructions for identifying failing jobs, interpreting run logs, and pinpointing exact error locations in test files and code lines.
    • Introduced detailed operational guidelines for headless execution mode, including safety constraints, prohibited operations, and best practices for automated job processing and dispatcher-initiated execution workflows.

Root cause of t003.5 repeated worker deaths: the E2E test at
responsive.spec.ts:672 measured <main> (SidebarInset) instead of the
DashboardInset div for the max-w-7xl constraint check. On 1920px viewport,
<main> fills available space after sidebar (>1280px), so the assertion
always failed. Workers consumed their context windows trying to debug
responsive CSS when the fix was a 4-line test selector change.

Fix applied to awardsapp feature/t003.5 branch (pushed to GitHub).
Self-improvement: added CI failure debugging protocol to full-loop.md
to prevent future workers from skipping CI log analysis.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 25, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80feb46 and 57fd051.

📒 Files selected for processing (1)
  • .agents/scripts/commands/full-loop.md

Walkthrough

Adds two documentation sections to the full-loop guidance: CI failure debugging steps (t1334) and mandatory headless dispatch rules (t158/t174) for supervisor-dispatched workers, with both sections duplicated across locations in the file.

Changes

Cohort / File(s) Summary
CI & Headless Guidance Documentation
.agents/scripts/commands/full-loop.md
Inserts CI failure debugging block with job identification, log reading, and test pinpointing procedures. Adds mandatory Headless dispatch rules section with seven-item uncertainty framework, prohibition on user prompts/TODO.md edits/auth handling changes, and operational guidelines for headless mode. Both blocks added in duplicate locations within the Task Development/Full Loop flow.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Poem

🔧 When CI breaks and headless workers roam,
Guidance lights the path back home,
Seven rules keep chaos at bay,
Debug with purpose, come what may! 🎯

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t1334

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

gemini-code-assist bot commented Feb 25, 2026

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue of repeated worker deaths during cross-browser testing by implementing a two-pronged solution. It rectifies an erroneous E2E test assertion that was causing consistent failures and establishes a clearer protocol for debugging CI failures, aiming to improve efficiency and prevent future misdiagnoses.

Highlights

  • E2E Test Correction: Corrected an end-to-end test in e2e/responsive.spec.ts that was incorrectly measuring the SidebarInset element instead of the DashboardInset div, leading to consistent failures on wide desktop viewports.
  • Enhanced CI Debugging Protocol: Introduced new guidelines in .agents/scripts/commands/full-loop.md for debugging CI failures, emphasizing the importance of reviewing CI logs before attempting code changes to prevent context exhaustion.
Changelog
  • .agents/scripts/commands/full-loop.md
    • Added detailed instructions for debugging CI failures, including steps to identify failing jobs and read logs, along with common pitfalls to avoid.
Activity
  • Initiated a root cause investigation into persistent worker_process_died_mid_task_pr_open_unmerged errors in cross-browser testing.
  • Identified e2e/responsive.spec.ts:672 as the single failing test due to incorrect DOM element measurement.
  • Applied a test fix in a separate repository (a managed private repo PR fix: make version scripts cross-platform and add validation #250) to target the correct element.
  • Updated internal documentation to mandate reading CI logs before attempting code changes for CI failure tasks.
  • Confirmed that all non-E2E checks are passing, with CI for the a managed private repo fix currently re-running.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 69 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 25 09:07:10 UTC 2026: Code review monitoring started
Wed Feb 25 09:07:11 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 69

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 69
  • VULNERABILITIES: 0

Generated on: Wed Feb 25 09:07:14 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable guidance to the worker protocol for debugging CI failures, instructing workers to analyze logs before attempting code changes. This is a great step towards preventing context exhaustion and blind debugging. I've suggested an improvement to the provided gh command to make the log analysis workflow more robust, automated, and simpler.

Comment on lines +201 to +205
# 1. Identify the failing job
gh pr checks <PR_NUMBER> --repo <owner/repo>

# 2. Get the run ID and read failure logs
gh run view <RUN_ID> --repo <owner/repo> --log | grep -iE 'FAIL|Error.*spec|expect.*received'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This suggested change automates the process of finding the failing run ID and viewing its logs, making the debugging workflow faster and more reliable.

  • gh run list combined with jq directly retrieves the ID of a failing run, avoiding the need to manually inspect the output of gh pr checks and extract the ID from a URL.
  • gh run view --log-failed is a more robust way to view failure logs, as it shows the full context for failed jobs without relying on grep patterns which might miss important details.
Suggested change
# 1. Identify the failing job
gh pr checks <PR_NUMBER> --repo <owner/repo>
# 2. Get the run ID and read failure logs
gh run view <RUN_ID> --repo <owner/repo> --log | grep -iE 'FAIL|Error.*spec|expect.*received'
# 1. Get the ID of the first failing run
RUN_ID=$(gh run list --pr <PR_NUMBER> --repo <owner/repo> --json databaseId,conclusion --jq '.[] | select(.conclusion=="failure") | .databaseId' | head -n 1)
# 2. Read failure logs for that run
gh run view "$RUN_ID" --repo <owner/repo> --log-failed

@marcusquinn
Copy link
Owner Author

Flagged for Human Review

Reason: PR #2283 (t1334: diagnose t003.5 worker deaths) is open with pending CI. t1334 completed and identified root cause of repeated t003.5 failures. Human review needed to merge findings and unblock t003.5.

This issue has been flagged by the AI supervisor for human review. Please assess and take appropriate action.


Flagged by AI Supervisor (automated reasoning cycle)

@marcusquinn
Copy link
Owner Author

marcusquinn commented Feb 25, 2026

Flagged for Human Review

Reason: PR #2283 (t1334: Diagnose t003.5 worker deaths) is OPEN with PENDING CI and 0 approvals. This PR contains the root cause analysis and fix for the recurring t003.5 failures that have wasted 4+ worker sessions. It should be reviewed and merged promptly to unblock t003.5 and prevent further token waste. The investigation found CI infrastructure issues that affect all a managed private repo cross-browser testing.

This issue has been flagged by the AI supervisor for human review. Please assess and take appropriate action.


Flagged by AI Supervisor (automated reasoning cycle)

@marcusquinn marcusquinn marked this pull request as ready for review February 25, 2026 13:54
@marcusquinn marcusquinn merged commit 99a150a into main Feb 25, 2026
14 of 15 checks passed
@augmentcode
Copy link

augmentcode bot commented Feb 25, 2026

🤖 Augment PR Summary

Summary: This PR documents a CI-first debugging workflow to prevent agents from burning context on blind investigation when PR checks are failing.

Changes:

  • Updated .agents/scripts/commands/full-loop.md with a mandatory “read CI logs first” protocol using gh commands
  • Added concrete guidance and common pitfalls (e.g., asserting against the wrong DOM element) to speed up root-cause identification

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

gh pr checks <PR_NUMBER> --repo <owner/repo>

# 2. Get the run ID and read failure logs
gh run view <RUN_ID> --repo <owner/repo> --log | grep -iE 'FAIL|Error.*spec|expect.*received'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh run view … --log | grep … can produce no output for many CI failures (setup/env errors, timeouts, non-test failures), which may mislead workers into thinking the logs are "empty." Consider noting that if the filter returns nothing, they should inspect the full unfiltered log to find the real failure signal.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-review Flagged for human review by AI supervisor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant