feat: songLyrics extension version 2 — word/syllable-level timing by ranokay · Pull Request #218 · opensubsonic/open-subsonic-api

ranokay · 2026-03-05T18:33:40Z

Summary

This PR adds version 2 of the songLyrics extension to the OpenSubsonic spec and OpenAPI docs.

Version 2 introduces optional word/syllable-level karaoke timing behind enhanced=true, while keeping the default response fully backward compatible with version 1.

It also documents richer lyric modeling for sources such as TTML:

independent kind tracks (main, translation, pronunciation)
word/syllable timing via cueLine + cue
shared per-track vocal attribution via structuredLyrics.agents
cueLine.agentId references instead of repeating singer/layer metadata on every cueLine

This PR follows up on discussion #213 and incorporates the review feedback on cue ordering, overlap rules, and multi-agent attribution.

What's new

New response types

cue: a single timed word or syllable with start, optional end, and value
cueLine: a timed line-level grouping of cue items
agent: reusable per-track attribution metadata for cueLines

New/extended fields

structuredLyrics.kind: classifies each lyric track as main, translation, or pronunciation
structuredLyrics.cueLine: word/syllable-level timing data parallel to line
structuredLyrics.agents: optional per-track attribution metadata for cue-attributed lyrics
cueLine.agentId: references an agent in the same structuredLyrics entry

Endpoint changes

getLyricsBySongId gets an enhanced boolean parameter on both GET and POST
when enhanced=true, the response may include:
- kind
- cueLine
- agents
- non-main lyric tracks such as translation and pronunciation
when enhanced=false or omitted, the response remains version 1-compatible

Contract clarifications added during review

cueLine is only meaningful for synced=true; unsynced lyrics must not include it
within a single cueLine, cue.end is all-or-none
when the source has partial cue end-times, servers must fill the missing ones
cues inside a single cueLine must not overlap
overlaps across different cueLines are still valid, since those represent parallel vocal layers
agents are scoped to a single structuredLyrics entry
agents[].id must be unique within that entry
if agents are used for cue-attributed lyrics, there must be exactly one role: "main" agent
when multiple cueLines share the same index, the one whose agent has role: "main" must come first
if agents is present, every cueLine in that entry must include agentId
each agentId must resolve to one local agents[].id in the same structuredLyrics entry
structuredLyrics entries are independent across all kind values, including main
clients must not assume 1:1 alignment of line arrays or cueLine arrays between main, translation, and pronunciation
cue counts may differ across tracks for the same lyric passage

Examples added/updated

enhanced Korean example with full second-line cueLine coverage
pronunciation example showing different cue counts from the main track
background-vocals example using agents + agentId
multiple-singers example using shared agents instead of repeated per-line role/name fields

Backward compatibility

enhanced defaults to false
without enhanced=true:
- the response shape is identical to version 1
- no cueLine arrays are returned
- no agents arrays are returned
- non-main kind tracks are not returned
- the existing line array remains unchanged

Servers supporting this version should advertise:

songLyrics versions [1, 2]

Files changed

New

content/en/docs/Responses/agent.md
content/en/docs/Responses/cue.md
content/en/docs/Responses/cueLine.md
openapi/schemas/Agent.json
openapi/schemas/Cue.json
openapi/schemas/CueLine.json

Updated

content/en/docs/Endpoints/getLyricsBySongId.md
content/en/docs/Extensions/songLyrics.md
content/en/docs/Responses/structuredLyrics.md
openapi/endpoints/getLyricsBySongId.json
openapi/openapi.json
openapi/schemas/StructuredLyrics.json

Closes #213

netlify · 2026-03-05T18:33:46Z

✅ Deploy Preview for opensubsonic ready!

Name	Link
🔨 Latest commit	`9c42bb9`
🔍 Latest deploy log	https://app.netlify.com/projects/opensubsonic/deploys/69beda89e63608000891574b
😎 Deploy Preview	https://deploy-preview-218--opensubsonic.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

content/en/docs/Responses/cueLine.md

content/en/docs/Endpoints/getLyricsBySongId.md

Copilot

Pull request overview

This PR introduces songLyrics extension v2 by adding word/syllable-level karaoke timing (cue/cueLine) and lyric-layer classification (kind) while keeping v1 behavior as the default unless enhanced=true is provided to getLyricsBySongId.

Changes:

Add new OpenAPI response schemas Cue and CueLine, and extend StructuredLyrics with optional kind and cueLine.
Extend getLyricsBySongId (GET + POST) with an enhanced parameter to opt into v2 responses.
Add/expand documentation pages and examples covering the new response types and v2 behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
openapi/schemas/StructuredLyrics.json	Adds optional `kind` and `cueLine` fields to structured lyrics schema.
openapi/schemas/Cue.json	New schema for word/syllable-level timing cues.
openapi/schemas/CueLine.json	New schema for line-level cue groupings with optional role metadata.
openapi/openapi.json	Registers `Cue` and `CueLine` under components/schemas.
openapi/endpoints/getLyricsBySongId.json	Adds `enhanced` query param (GET) and form field (POST).
content/en/docs/Responses/structuredLyrics.md	Documents v2 structuredLyrics fields and examples.
content/en/docs/Responses/cue.md	New docs page for `cue`.
content/en/docs/Responses/cueLine.md	New docs page for `cueLine`.
content/en/docs/Extensions/songLyrics.md	Adds Version 2 section describing the new capabilities and gating.
content/en/docs/Endpoints/getLyricsBySongId.md	Documents the new `enhanced` parameter and v2 response examples/notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

openapi/schemas/CueLine.json

openapi/schemas/Cue.json

openapi/schemas/CueLine.json

openapi/schemas/Cue.json

openapi/schemas/CueLine.json

Add enhanced lyrics support with word/syllable-level karaoke timing (cueLine/cue), lyric layer classification (kind: main/translation/ pronunciation), and role-based vocal attribution (bg, voiceN, group). All new fields are gated behind the `enhanced=true` query parameter for full backward compatibility with version 1 clients. New response types: - cue: a single word or syllable with start/end timing - cueLine: a line of cues with optional role, value, and timing Modified response types: - structuredLyrics: added kind and cueLine fields Modified endpoints: - getLyricsBySongId: added enhanced parameter (GET + POST) Closes opensubsonic#213

Formatting fix so these schema files match the repo's prevailing JSON indentation style. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

…playRole\n\n- Split `voiceN` string pattern into `role` enum (`bg`, `voice`, `group`)\n plus separate `voiceIndex` integer for individual voice parts\n- Add optional `displayRole` string for human-readable vocal layer labels\n- Promote derived end-time overlap avoidance from SHOULD to MUST\n- Update cueLine, getLyricsBySongId, and songLyrics extension docs

…Role

…rvers must normalize source overlaps so cue[n].end <= cue[n+1].start\nwithin each cueLine. Cross-cueLine overlaps (different role/voiceIndex)\nremain expected for parallel vocal layers.

…ine.md: add example with role/voiceIndex/displayRole fields\n- structuredLyrics.md: add pronunciation kind example with cueLine data

Tolriq · 2026-03-22T09:00:43Z

I've mostly already validated as helped shape it. Waiting for at least one @opensubsonic/servers to confirm they don't see this as problematic to implement as it's the first evolution of a version of an extension.

Tolriq · 2026-03-25T07:48:53Z

@deluan @epoupon

ayla6 · 2026-04-02T00:19:52Z

I found a small problem with the spec. It's very specific and wouldn't be found in any decent quality lyrics but:
If you have something like this

<span begin="00:00:00.000" end="00:00:00.500">Hello</span> <span begin="00:00:00.500" end="00:00:01.000">yay</span> hello <span begin="00:00:02.500" end="00:00:03.000">hello</span> <span begin="00:00:03.000" end="00:00:03.500">hi</span> <span begin="00:00:03.500" end="00:00:04.500">awesome</span>

where one word without synced data is followed by an identical one with synced data, it becomes impossible to tell which one of them is synced when it gets turned into the json the spec uses:

"cueLine": [
            {
              "index": 0,
              "start": 0,
              "end": 5000,
              "value": "Hello yay hello hello hi awesome",
              "cue": [
                {
                  "start": 0,
                  "end": 500,
                  "value": "Hello"
                },
                {
                  "start": 500,
                  "end": 1000,
                  "value": "yay"
                },
                {
                  "start": 2500,
                  "end": 3000,
                  "value": "hello"
                },
                {
                  "start": 3000,
                  "end": 3500,
                  "value": "hi"
                },
                {
                  "start": 3500,
                  "end": 4500,
                  "value": "awesome"
                }
              ]
            }
          ],

it's minor but it's just something i've noticed

ayla6 · 2026-04-02T02:22:14Z

if i can give my two cents maybe it should be changed to use start character and end character or something, it's simpler to apply and is a lot more accurate. probably should've said it before it being merged tho

Tolriq · 2026-04-02T08:02:47Z

@ranokay seems your PR is still not merged in Navidrome so we can still amend to start end char positions. WDYT ?

ranokay · 2026-04-02T15:31:47Z

Yeah, fair point, that edge case is genuinely ambiguous with the current model. I hadn’t considered the repeated untimed/timed identical-token case.
I’m okay with adjusting it before Navidrome merges. I do wonder if explicit ordered tokens/segments would age better than char offsets, but if we want the smallest possible amendment, positions on the cues are probably the simpler path.
I see two possible fixes:

Minimal amendment: keep cue as-is and add char positions into cueLine.value:

{ "value": "hello", "start": 2500, "end": 3000, "charStart": 16, "charEnd": 21 }

Bigger but cleaner change: preserve explicit ordered tokens, timed and untimed:

"tokens": [
  { "value": "Hello", "start": 0, "end": 500 },
  { "value": "yay", "start": 500, "end": 1000 },
  { "value": "hello" },
  { "value": "hello", "start": 2500, "end": 3000 }
]

My feeling is char positions are the smaller amendment, while ordered tokens are the cleaner long-term model. What do you think?

Tolriq · 2026-04-02T15:40:03Z

We said earlier in the spec that when there's an end all cues (not token :p) must have them, so the 2 would contradict that and make things more complicated client side. Internally I do use solution 1 and I guess most will need to do the char mapping, so directly doing it is better for clients IMO.

ranokay · 2026-04-02T15:46:13Z

Alright, I think the main thing to pin down is the exact semantics: relative to cueLine.value, 0-based or not, and whether charEnd is exclusive. We should probably also define what “character position” means for Unicode so different implementations don’t count differently.

If you’re aligned on that direction, I can put together a small follow-up PR for the schema/docs.

Tolriq · 2026-04-02T18:18:41Z

0 based and inclusive the First "Hello" is 0-4, for Unicode that's a great question :) Considering the data to generate those endpoints can't have cue split in the middle of a unicode, using byte offset is probably the simplest to avoid issues for some clients ?

ranokay · 2026-04-02T19:33:29Z

My only concern is naming: if we go with UTF-8 byte offsets, I think they should be called byteStart / byteEnd rather than charStart / charEnd, otherwise people will assume actual character positions.

Then we can define them precisely as 0-based inclusive offsets into the UTF-8 encoding of the final cueLine.value string, with no normalization step.

Tolriq · 2026-04-02T19:43:34Z

works for me.

ranokay · 2026-04-02T21:29:44Z

Opened #228

ranokay mentioned this pull request Mar 5, 2026

feat: add TTML lyrics support with token-level karaoke and translation/pronunciation layers navidrome/navidrome#5076

Open

Tolriq approved these changes Mar 12, 2026

View reviewed changes

Tolriq requested review from a team March 12, 2026 07:42

Tolriq previously approved these changes Mar 12, 2026

View reviewed changes

kgarner7 requested changes Mar 13, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 13, 2026 19:12

ranokay dismissed Tolriq’s stale review via afc3522 March 13, 2026 19:12

Copilot started reviewing on behalf of ranokay March 13, 2026 19:13 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

kgarner7 previously approved these changes Mar 14, 2026

View reviewed changes

ranokay and others added 10 commits March 15, 2026 00:02

docs(songLyrics): export cue schemas and clarify v2 advertisement

ff8e46c

docs(songLyrics): clarify v2 cue semantics

4815e00

Apply suggestions from code review

b128e59

Formatting fix so these schema files match the repo's prevailing JSON indentation style. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

docs(songLyrics): add multi-voice example with voiceIndex and display…

c5a0b67

…Role

docs(songLyrics): require non-overlapping cues within a cueLine\n\nSe…

b1df901

…rvers must normalize source overlaps so cue[n].end <= cue[n+1].start\nwithin each cueLine. Cross-cueLine overlaps (different role/voiceIndex)\nremain expected for parallel vocal layers.

docs(songLyrics): add voice role and pronunciation examples\n\n- cueL…

194d662

…ine.md: add example with role/voiceIndex/displayRole fields\n- structuredLyrics.md: add pronunciation kind example with cueLine data

docs(songLyrics): align cueLine schema with voice fields

cffa864

docs(songLyrics): finalize agent attribution v2 spec

61948d9

ranokay dismissed kgarner7’s stale review via 61948d9 March 14, 2026 22:03

ranokay force-pushed the songlyrics-v2 branch from 49b84ad to 61948d9 Compare March 14, 2026 22:03

ranokay and others added 2 commits March 16, 2026 00:05

docs(songLyrics): drop redundant agentId note

7d727ce

Merge branch 'main' into songlyrics-v2

9c42bb9

kgarner7 approved these changes Mar 21, 2026

View reviewed changes

kgarner7 requested a review from Tolriq March 21, 2026 18:33

epoupon approved these changes Mar 25, 2026

View reviewed changes

Tolriq enabled auto-merge (squash) March 26, 2026 07:18

Tolriq approved these changes Mar 26, 2026

View reviewed changes

Tolriq merged commit 7072ab1 into opensubsonic:main Mar 26, 2026
4 checks passed

Conversation

ranokay commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new

New response types

New/extended fields

Endpoint changes

Contract clarifications added during review

Examples added/updated

Backward compatibility

Files changed

New

Updated

Uh oh!

netlify bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for opensubsonic ready!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tolriq commented Mar 22, 2026

Uh oh!

Tolriq commented Mar 25, 2026

Uh oh!

Uh oh!

ayla6 commented Apr 2, 2026

Uh oh!

ayla6 commented Apr 2, 2026

Uh oh!

Tolriq commented Apr 2, 2026

Uh oh!

ranokay commented Apr 2, 2026

Uh oh!

Tolriq commented Apr 2, 2026

Uh oh!

ranokay commented Apr 2, 2026

Uh oh!

Tolriq commented Apr 2, 2026

Uh oh!

ranokay commented Apr 2, 2026

Uh oh!

Tolriq commented Apr 2, 2026

Uh oh!

ranokay commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ranokay commented Mar 5, 2026 •

edited

Loading

netlify bot commented Mar 5, 2026 •

edited

Loading