Skip to content

Captioned Speech gives duplicate text #139

Closed
@manascb1344

Description

Describe the bug
The captioned speech example gives duplicate word level timestamps

python kokoro-docker.py
Generating captioned speech for example texts...


Example 1:
Input text: Hello world! Welcome to the captioned speech system.
Response status: 200
Response headers: {'date': 'Sat, 08 Feb 2025 16:40:28 GMT', 'server': 'uvicorn', 'content-disposition': 'attachment; filename=speech.wav', 'x-accel-buffering': 'no', 'cache-control': 'no-cache', 'x-word-timestamps': '[{"word": "Hello", "start_time": 0.175, "end_time": 0.525}, {"word": "Hello", "start_time": 0.175, "end_time": 0.525}, {"word": "world", "start_time": 0.525, "end_time": 0.9}, {"word": "world", "start_time": 0.525, "end_time": 0.9}, {"word": "!", "start_time": 0.9, "end_time": 0.9875}, {"word": "!", "start_time": 0.9, "end_time": 0.9875}, {"word": "Welcome", "start_time": 0.9875, "end_time": 1.45}, {"word": "Welcome", "start_time": 0.9875, "end_time": 1.45}, {"word": "to", "start_time": 1.45, "end_time": 1.5375}, {"word": "to", "start_time": 1.45, "end_time": 1.5375}, {"word": "the", "start_time": 1.5375, "end_time": 1.625}, {"word": "the", "start_time": 1.5375, "end_time": 1.625}, {"word": "captioned", "start_time": 1.625, "end_time": 2.075}, {"word": "captioned", "start_time": 1.625, "end_time": 2.075}, {"word": "speech", "start_time": 2.075, "end_time": 2.4}, {"word": "speech", "start_time": 2.075, "end_time": 2.4}, {"word": "system", "start_time": 2.4, "end_time": 3.1}, {"word": "system", "start_time": 2.4, "end_time": 3.1}, {"word": ".", "start_time": 3.1, "end_time": 3.25}, {"word": ".", "start_time": 3.1, "end_time": 3.25}]', 'content-type': 'audio/wav', 'Transfer-Encoding': 'chunked'}
Audio saved to: [REMOVED]
Timestamps saved to:  [REMOVED]

Word-level timestamps:
Hello: 0.175s - 0.525s
Hello: 0.175s - 0.525s
world: 0.525s - 0.900s
world: 0.525s - 0.900s
!: 0.900s - 0.988s
!: 0.900s - 0.988s
Welcome: 0.988s - 1.450s
Welcome: 0.988s - 1.450s
to: 1.450s - 1.538s
to: 1.450s - 1.538s
the: 1.538s - 1.625s
the: 1.538s - 1.625s
captioned: 1.625s - 2.075s
captioned: 1.625s - 2.075s
speech: 2.075s - 2.400s
speech: 2.075s - 2.400s
system: 2.400s - 3.100s
system: 2.400s - 3.100s
.: 3.100s - 3.250s
.: 3.100s - 3.250s

Example 2:
Input text: The quick brown fox jumps over the lazy dog.
Response status: 200
Response headers: {'date': 'Sat, 08 Feb 2025 16:40:28 GMT', 'server': 'uvicorn', 'content-disposition': 'attachment; filename=speech.wav', 'x-accel-buffering': 'no', 'cache-control': 'no-cache', 'x-word-timestamps': '[{"word": "The", "start_time": 0.175, "end_time": 0.25}, {"word": "The", "start_time": 0.175, "end_time": 0.25}, {"word": "quick", "start_time": 0.25, "end_time": 0.5}, {"word": "quick", "start_time": 0.25, "end_time": 0.5}, {"word": "brown", "start_time": 0.5, "end_time": 0.8375}, {"word": "brown", "start_time": 0.5, "end_time": 0.8375}, {"word": "fox", "start_time": 0.8375, "end_time": 1.2375}, {"word": "fox", "start_time": 0.8375, "end_time": 1.2375}, {"word": "jumps", "start_time": 1.2375, "end_time": 1.5375}, {"word": "jumps", "start_time": 1.2375, "end_time": 1.5375}, {"word": "over", "start_time": 1.5375, "end_time": 1.7375}, {"word": "over", "start_time": 1.5375, "end_time": 1.7375}, {"word": "the", "start_time": 1.7375, "end_time": 1.825}, {"word": "the", "start_time": 1.7375, "end_time": 1.825}, {"word": "lazy", "start_time": 1.825, "end_time": 2.2}, {"word": "lazy", "start_time": 1.825, "end_time": 2.2}, {"word": "dog", "start_time": 2.2, "end_time": 2.85}, {"word": "dog", "start_time": 2.2, "end_time": 2.85}, {"word": ".", "start_time": 2.85, "end_time": 3.025}, {"word": ".", "start_time": 2.85, "end_time": 3.025}]', 'content-type': 'audio/wav', 'Transfer-Encoding': 'chunked'}
Audio saved to:  [REMOVED]
Timestamps saved to: [REMOVED]

Word-level timestamps:
The: 0.175s - 0.250s
The: 0.175s - 0.250s
quick: 0.250s - 0.500s
quick: 0.250s - 0.500s
brown: 0.500s - 0.838s
brown: 0.500s - 0.838s
fox: 0.838s - 1.238s
fox: 0.838s - 1.238s
jumps: 1.238s - 1.538s
jumps: 1.238s - 1.538s
over: 1.538s - 1.738s
over: 1.538s - 1.738s
the: 1.738s - 1.825s
the: 1.738s - 1.825s
lazy: 1.825s - 2.200s
lazy: 1.825s - 2.200s
dog: 2.200s - 2.850s
dog: 2.200s - 2.850s
.: 2.850s - 3.025s
.: 2.850s - 3.025s

Screenshots or console output

Image

Branch / Deployment used
Using Docker Run command for the cpu

docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:v0.2.0post3

Operating System
On Linux (Pop OS) Ubuntu 22.04

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions