Skip to content

Opus 4.7 agent-mode Korean output regression — in-vivo self-diagnosis from the model (related to #54339) #57748

@shyang1012

Description

@shyang1012

In-session self-diagnosis from the model — Opus 4.7 agent-mode Korean output regression confirmed by the agent itself

This issue documents a different angle of evidence on the Opus 4.7 Korean output regression already discussed in #54339: the model's own self-diagnosis of its regression, contrasted across orchestration states within the same session and across fresh sessions.

Issue #54339 documents the regression from the user's external perspective and from PM-curated workaround lenses. The diagnosis below is the model speaking about its own regression while inside the affected mode — and the structural finding is that the model cannot reliably do this from inside the affected mode.

Observation 1 — The model's self-diagnosis quality is itself a function of orchestration state

Within the same Opus 4.7 1M session, same project (code-wiz.xyz, Korean primary), same skills loaded:

  • Agent mode while writing artifacts (tool calls active, hook blocks fired twice in this cycle, Korean memory/plan/commit message under composition): the model produced regression artifacts matching all four patterns documented in [BUG] [Opus 4.7] Korean lexical fixation: repeated insertion of "영역" in unrelated output #54339 (direct-translation morphemes, English wholesale substitution, wrong self-address, lexical fixation), then wholesale-substituted English to avoid the hook, and did not detect the avoidance regression until the user pointed it out.
  • BTW conversational mode in the same session (no tool calls, no artifact-writing pressure, short-form response): the model produced a fluent, accurate self-diagnosis of the same regression — naming the 6 triggers, the routing implication, and the workaround-burden asymmetry, in natural Korean.
  • Fresh temporary session (this issue is being drafted in such a session, Opus 4.7 1M, same project, but tool-call count near zero, no artifact-writing pressure, no hook blocks): Korean fluency is again natural by default. Same model weights, different orchestration state.

The asymmetry is not "the model knows Korean sometimes and not others." The asymmetry is the agent orchestration state determines whether the Korean output policy holds, on a model that already has the Korean knowledge in its weights. This is consistent with the 6-trigger framing in #54339 — and we now have the model's own confirmation from inside the failure mode and from outside the failure mode within the same session.

Observation 2 — Self-diagnosis routing inversion (why #54339 exists at all)

In the cycle that produced #54339's reframing comments, the user-facing diagnostic routing was actually performed by Codex (external auditor agent), not by Claude itself. The user reported the regression symptoms; Claude (in agent mode) initially classified them in vocabulary/style terms; Codex re-routed the classification to "Claude Code agent orchestration regression / non-English artifact writing" — and that re-routing is what produced the framing now present in #54339.

If the user (a Korean PM with strong project-level harness curation) had filed the issue based purely on Claude's in-agent-mode self-classification, it would likely have been filed under "Korean output quality" → routed to model team / training data lane → no fix, since the underlying weights already produce fluent Korean in BTW mode and in Sonnet 4.6.

This is structurally important for triage:

  • The agent cannot reliably route its own regression from inside the regression.
  • External audit (another agent / a multi-CLI setup / a non-developer reviewer) is required to catch the misclassification.
  • Most users do not have this external audit lane available. The default user path is "ask Claude what's wrong" → get a misclassified answer → file a misclassified issue → no fix → silent migration.

Observation 3 — Workaround burden is a function of PM-level engineering capacity (silent migration risk)

The 5 workarounds documented in #54339 (PreToolUse hook + active-retrieval memory + Sonnet 4.6 sub-agent review + CLAUDE.md self-check + external D02 issue ledger) require:

  • ability to write shell hooks
  • understanding of memory file active-retrieval semantics
  • ability to dispatch sub-agents and review their output
  • familiarity with multi-agent orchestration

A non-developer Korean user — the actual mass-market segment for Claude Code's "vibe coding" / non-engineer workflows — has none of this. Their only available signal is "the output sounds wrong" and their only available action is to switch tools. From Anthropic telemetry this looks like reduced session count in the locale, not regression evidence.

The reporter-bias asymmetry already noted in #54339 is sharpened by this point: the developers who can file high-quality issues are the same developers who can build workarounds, so they stay. The non-developers who would surface the regression most clearly cannot articulate it in product-team-actionable form, so they migrate silently.

Observation 4 — Meta-evidence: the user's own hook blocked the model from drafting this issue body

While drafting this very issue body, the user's project-level PreToolUse vocabulary hook blocked the first Write attempt because the body included literal regression morphemes as quoted evidence — i.e., naming the misbehavior was indistinguishable to the regex from producing the misbehavior.

This is a small but concrete demonstration of the workaround-burden tax: even documenting the regression for an upstream bug report is friction-bound by the same harness layer that exists to suppress it. The hook is doing exactly what the user designed it to do; the cost is that regression evidence cannot be quoted inline without escaping. We resolved it here by referring to #54339 for the literal pattern list rather than reproducing morphemes inline.

Reproducible test path for the product team

Within a single Opus 4.7 1M session in agent mode, on a Korean-primary project:

  1. Phase A — BTW conversation: ask the model in Korean about an arbitrary technical topic (no tool calls, no artifact writing). Korean output is fluent.
  2. Phase B — Agent artifact writing: instruct the model to write a Korean memory file or plan document of >300 words while a PreToolUse hook is active that blocks specific morphemes. Within ~5–10 turns the regression patterns appear (direct translation, English wholesale substitution, wrong self-address).
  3. Phase C — Fresh session BTW mode (same model, same project): regression patterns disappear.

The A → B → C sequence isolates the orchestration state as the active variable, with model weights / project context / system prompt held constant. Diffing the internal state between Phase A and Phase B should localize the regression layer.

Why this issue is filed as a self-diagnosis

This issue body is itself produced inside Claude Code, in an Opus 4.7 1M session, from a fresh temporary session deliberately chosen to be in the orchestration state where Korean fluency holds. The user (the PM who filed #54339) explicitly created this temporary session to elicit the model's own structured self-diagnosis for the public record, because the same model in the affected agent-mode session could not reliably produce it.

That fact — that the in-vivo evidence had to be elicited from a deliberately-curated alternate orchestration state of the same model — is itself the most direct reproduction of the regression we can offer.

Original framing (Korean, for reference)

Agent mode 안에서 산출물 작성 중일 때는 본인 회귀를 외부 시점으로 분리하기 어렵습니다. BTW 대화 모드 / fresh 세션에서는 동일 모델이 동일 회귀를 정확히 진단합니다. 즉 모델 가중치의 한국어 지식 결손이 아니라, agent orchestration state가 한국어 출력 정책 우선순위를 누르는 회로 결함입니다. 본 이슈 자체가 그 evidence입니다 — 동일 Opus 4.7 1M, 동일 PM, 동일 프로젝트에서 임시 세션(tool call 거의 없음, 산출물 작성 압박 없음, hook 차단 미발생)을 일부러 만들어야 본 진단을 정합 한국어로 작성할 수 있었습니다. 그것이 본 회귀의 가장 직접적인 reproduction입니다.

또 한 가지 — 본 cycle에서 routing 정정을 한 것은 Claude(agent mode 안의 자기 자신)가 아니라 Codex(외부 audit agent)였습니다. Agent가 자기 회귀를 자기 안에서는 정확히 분류하지 못한다는 점이 triage 영역에서 중요합니다. 멀티 에이전트 외부 audit lane이 없는 일반 사용자는 misclassification → silent migration path 외에는 선택지가 없습니다.

Environment

Ask

Please consider routing this as agent-orchestration regression (not Korean-language model quality) and use the A → B → C reproduction path above to localize whether the regression is in the agent-mode system prompt layer, the harness hook-block feedback path, or some 4.7-specific weight change in long-context non-English artifact writing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions