Skip to content

we use crop the transcript badly (by decimating) #800

Open
@wassname

Description

@wassname

Describe the bug
问题描述

A clear and concise description of what the bug is.

On youtube, this extension uses subtitles not transcript. The subtitles are terrible, and lead to the llm giving poor output

To Reproduce

如何复现

  1. go to https://www.youtube.com/watch?v=IYaNscnE7rc&t=556s
  2. run ChatGPTBox
  3. look at the summary
  4. open summary in separate window
  5. look at the inputs the summary
  6. go back to the video, open the transcript and compare

It seems that this extension is using the subtitles, not the transcript. But the subtitles often have much poorer transcriber model and uncommon words are totally missed.

For example, for this video

Expected behavior
期望行为

A clear and concise description of what you expected to happen.

This is part of the transcript available in the UI

it is now a matter of public record that under pompeo's explicit Direction the CIA Drew up plans to kidnap and to assassinate me within the Ecuadorian Embassy in London and authorized going after my European colleagues subjecting us to theft hacking attacks and the planting of false information my wife and my infant son were also targeted a CIA asset was permanently assigned to track my wife and instructions were given to obtain DNA from my six month-old son's nappy

And this is the subtitle information received in ChatGPTBox

it is now a matter of public,kidnap and to assassinate me within the,hacking attacks and the planting of,assigned to track my wife and,nappy

As you can see it's a poor source of informaiton

Please complete the following information):
请补全以下内容

  • OS: linux
  • Browser: firefox

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions