Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to grab YouTube video transcript with timestamps #1289

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

thevops
Copy link

@thevops thevops commented Feb 7, 2025

What this Pull Request (PR) does

This PR adds the ability to grab the transcript of a YouTube video with timestamps. The timestamps are formatted as HH:MM:SS and are prepended to each line of the transcript. The feature is enabled by the new --transcript-with-timestamps flag, so it's similar to the existing --transcript flag.

Example future use-case

Providing a summary of a video that includes timestamps for quick navigation to specific parts of the video.

Related issues

Please reference any open issues this PR relates to in here - not applicable.

Screenshots/examples

Provide any screenshots you may find relevant to facilitate us understanding your PR.

❯ ./fabric -y "https://www.youtube.com/watch?v=5DgHDANmP9M" --transcript-with-timestamps             
[00:00:07 - 00:00:10] hi uh hi everybody good afternoon thank
[00:00:09 - 00:00:13] you so much for coming to my talk I'm
[00:00:10 - 00:00:15] tram founder of Igan lay uh today I'm
[00:00:13 - 00:00:18] going to talk about the state of the aan
[00:00:15 - 00:00:19] universe so you may have seen some of my
[00:00:18 - 00:00:21] earlier talks where I mostly talk about
[00:00:19 - 00:00:23] the future today I'm going to talk about
[00:00:21 - 00:00:27] the present and what we've done till now
[00:00:23 - 00:00:29] okay so firstly what are we all about
...
..

The output can be easily used as input for patterns.

Caveats

The output containing timestamps significantly increases the length of the prompt sent to a model. This can lead to rate limiting. We may mitigate that issue by changing the timestamps' format.
I'm wondering about skipping seconds and concatenating lines under the same timestamp HH:MM.

This commit adds the ability to grab the transcript
of a YouTube video with timestamps. The timestamps
are formatted as HH:MM:SS and are prepended to
each line of the transcript. The feature is enabled
by the new `--transcript-with-timestamps` flag,
so it's similar to the existing `--transcript` flag.

Example future use-case:

Providing summary of a video that includes timestamps
for quick navigation to specific parts of the video.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant