Skip to content

compose-scenes-with-skills: pass word-level transcript into per-beat compose prompt (word-sync animations) #204

@kiyeonjeon21

Description

@kiyeonjeon21

Why

ROADMAP Phase 4 lists "compose-scenes-with-skills narration awareness — pass word-level transcript so Claude can word-sync animations to audio" as an open item.

The interesting part: the data already exists but is dropped on the floor when building the LLM prompt.

State of code (2026-04-29)

Word-level transcript is generated upstream in vibe scene build:

  • packages/cli/src/commands/scene.ts:854-878 — runs Whisper with granularity: "word", writes assets/transcript-<id>.json, and assembles transcriptWords: { text, start, end }[]
  • scene.ts:988 — these words flow into the beat result for runtime use (Hyperframes __hf.media consumers)

But the per-beat compose prompt doesn't see them:

  • packages/cli/src/commands/_shared/compose-prompts.ts:218 — instructions reference cues.narration (raw text) only
  • packages/cli/src/commands/_shared/compose-scenes-skills.ts:178-179 — cue-rendering only emits the narration string, not timings

So the LLM composing scene HTML can't author word-synced animations because it doesn't know when each word is spoken.

Scope

  • Plumb transcriptWords through to composeScenesWithSkills() per beat
  • In compose-scenes-skills.ts:178-193 cue rendering, when transcriptWords exists, emit a structured block (probably YAML or JSON inline) listing { text, start, end } per word
  • Update compose-prompts.ts instructions (line 218 area) to mention word-level timings as available — and what the LLM is expected to do with them (data-attributes on spans for GSAP timing? CSS keyframes? leave that to the Hyperframes skill)
  • Token-budget guard: long narrations have hundreds of words — gate on transcriptWords.length and either truncate or skip if over a threshold
  • Tests in compose-prompts.test.ts / compose-scenes-skills.test.ts covering: no transcript, short transcript, oversized transcript

Reference

  • ROADMAP.md Phase 4 "Open items in Phase 4 (v0.61+ candidates)"
  • Word-sync comment already in code: scene.ts:850 "GSAP word-sync from it. Failure is non-fatal — narration still plays..."

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or enhancement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions