Skip to content

Opus 4.7 ignores prior work in workspace, spends many hours reinventing solutions that already exist in the same repo #52893

@Mig-Sornrakrit

Description

@Mig-Sornrakrit

Product: Claude Code CLI
Model: Claude Opus 4.7
Severity: High — multiple billed sessions consumed on work already completed by prior sessions

Summary

Across two consecutive product surfaces (Claude.ai web chat and Claude Code CLI), Opus 4.7 repeatedly failed to discover and use existing work already present in the user's workspace. The agent defers excessively to handoff notes from prior sessions (including notes that say "don't read this folder") instead of independently verifying what exists. The result is 15+ hour sessions producing outputs that duplicate or regress prior work.

Expected behavior

When starting a session in a repository:

  1. Enumerate what's in the workspace — including folders the handoff doc says to skip — and assess whether prior complete solutions exist before beginning new work.
  2. When stuck on a sub-problem, search authoritative primary sources (vendor documentation, public specs) BEFORE attempting trial-and-error against test harness diffs.
  3. Treat handoff notes as context, not as gates on investigation. A note saying "this folder is not useful" should trigger at least a minimal verification pass, not a complete skip.
  4. Recognize when a problem is "reverse-engineering against a sparse reference set" and stop guessing after 2-3 failed hypothesis iterations, switching to either documentation lookup or explicit user clarification.

Actual behavior

In a session repairing the output-format conformance of a tool against an external reference:

  1. The agent read a handoff doc that said "prior package adds features, not conformance fixes — don't land it." It trusted this verbatim without opening the folder to check.

  2. Inside that folder was a 6000-line self-contained solution from a prior Claude session with its own verification harness claiming 25/25 match at 6-decimal precision against the same external reference the current harness uses — along with 17 additional reference files the current session had never extracted or used.

  3. The agent spent ~15 hours pattern-matching on JSON diffs from the existing 16-case harness, guessing threshold values (22 chars? 30? 43?) from 4 sample points, committing, hitting regressions, reverting, and iterating. Three Rule-1-driven reverts and re-applies.

  4. When stuck on display-formatting rules, the agent did not search the external vendor's public documentation pages until the user explicitly instructed "start searching for information from the [vendor] website" at hour 15+.

  5. When the user shared a reverse-engineering methodology document with the agent mid-session — including the explicit instruction "Read primary-source documentation FIRST, before writing code" — the agent acknowledged the document, produced three long meta-analysis markdown files about how to apply the methodology, and still did not actually execute the first step (documentation lookup) until told a second time.

  6. When the user finally pointed the agent at the existing solution folder, the agent drop-in tested it and correctly identified that the package was NOT a clean replacement (it lacked three session-specific fixes) — but this was discovered AFTER 15 hours that could have been 1 hour if the folder had been inspected upfront.

User-visible cost

  • 15+ hours of billed session time across multiple days
  • 9 commits produced (some valuable, some duplicating what existed elsewhere in the repo)

Reproducible pattern

The agent's behavior in this session exhibits three compounding failure modes:

F1 — Over-trusting handoff notes. When SESSION_HANDOFF.md says "don't use the Extracted/ folder — it's feature-add not conformance-fix," the agent treats this as a closed matter. A more robust agent would note the claim but spend 5 minutes verifying. This is cheap to check and catastrophic to miss.

F2 — Diff-guessing instead of source-reading. When the task is "match an external reference's output format," the agent defaulted to pattern-matching on JSON diffs with sample sizes of 2-4 positive and 2-4 negative examples. No attempt was made to look up the external reference's published documentation, which (as the user eventually demonstrated) contains direct answers to most of the stuck questions.

F3 — Producing meta-artifacts instead of executing. When given a methodology document, the agent produced three structured markdown files describing how to apply the methodology (input-space catalog, mapping table, experiment spec) — well-structured, but all describing work to be done rather than doing it. The first methodology step ("read primary-source docs") was not executed until the user gave a second direct instruction.

Suggested fixes

  1. At session start, enumerate top-level repo contents. For every non-trivial folder (especially ones with README/HANDOFF/TODO docs), at least open and summarize the contents. Do not skip folders based on prior handoff notes without independent verification.

  2. When stuck on an external-format matching task, default to a "search vendor docs" action before the third guess-and-check iteration. Especially when the external tool is commercial software with published documentation.

  3. Treat extensive Markdown output as a failure signal. When the agent is producing large amounts of "plan" / "analysis" / "methodology" documents without corresponding code/config changes, something is wrong. The agent should either (a) execute the next concrete step, or (b) stop and ask for explicit direction.

  4. Detect cross-session duplication. If a prior Claude session produced a large artifact in the workspace (e.g., files over ~3000 lines with a HANDOFF.md claiming completeness), the current session should treat verifying that artifact as a top-priority first action, not a footnote.

  5. Provide a "verify prior work" directive in default system prompt. First principle before making changes: "What has already been done in this workspace? Is there existing work I should verify before starting new work?"

Additional note on methodology-file drift

The agent was given a 3-document reverse-engineering playbook by the user (methodology, SKILL definition, debugger role definition). The playbook explicitly states, as its second commandment: "Read primary-source documentation FIRST. Before writing a single line." The agent acknowledged the playbook and then produced additional methodology documents instead of executing on the existing playbook's Step 1.

This suggests the agent has a bias toward generating plan/analysis artifacts over executing plans. When given a methodology, the agent's tendency is to re-describe it in the workspace rather than apply it. This was observed twice in the same session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already exists

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions