compose-scenes-with-skills: pass word-level transcript into per-beat compose prompt (word-sync animations)

## Why
ROADMAP Phase 4 lists *"`compose-scenes-with-skills` narration awareness — pass word-level transcript so Claude can word-sync animations to audio"* as an open item.

The interesting part: **the data already exists** but is dropped on the floor when building the LLM prompt.

## State of code (2026-04-29)

Word-level transcript is generated upstream in `vibe scene build`:
- `packages/cli/src/commands/scene.ts:854-878` — runs Whisper with `granularity: "word"`, writes `assets/transcript-<id>.json`, and assembles `transcriptWords: { text, start, end }[]`
- `scene.ts:988` — these words flow into the beat result for runtime use (Hyperframes `__hf.media` consumers)

But the per-beat compose prompt **doesn't see them**:
- `packages/cli/src/commands/_shared/compose-prompts.ts:218` — instructions reference `cues.narration` (raw text) only
- `packages/cli/src/commands/_shared/compose-scenes-skills.ts:178-179` — cue-rendering only emits the narration string, not timings

So the LLM composing scene HTML can't author word-synced animations because it doesn't know when each word is spoken.

## Scope

- [ ] Plumb `transcriptWords` through to `composeScenesWithSkills()` per beat
- [ ] In `compose-scenes-skills.ts:178-193` cue rendering, when `transcriptWords` exists, emit a structured block (probably YAML or JSON inline) listing `{ text, start, end }` per word
- [ ] Update `compose-prompts.ts` instructions (line 218 area) to mention word-level timings as available — and what the LLM is expected to do with them (data-attributes on spans for GSAP timing? CSS keyframes? leave that to the Hyperframes skill)
- [ ] Token-budget guard: long narrations have hundreds of words — gate on `transcriptWords.length` and either truncate or skip if over a threshold
- [ ] Tests in `compose-prompts.test.ts` / `compose-scenes-skills.test.ts` covering: no transcript, short transcript, oversized transcript

## Reference
- ROADMAP.md Phase 4 *"Open items in Phase 4 (v0.61+ candidates)"*
- Word-sync comment already in code: `scene.ts:850` *"GSAP word-sync from it. Failure is non-fatal — narration still plays..."*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compose-scenes-with-skills: pass word-level transcript into per-beat compose prompt (word-sync animations) #204

Why

State of code (2026-04-29)

Scope

Reference

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

compose-scenes-with-skills: pass word-level transcript into per-beat compose prompt (word-sync animations) #204

Description

Why

State of code (2026-04-29)

Scope

Reference

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions