Skip to content

johnshearing/scrape_yt_mk_transcripts

Repository files navigation

Scrape a YouTube channel for audio.
Create a transcript with punctuation, diarization, timestamps, and metadata.
The transcripts are ingested by the LightRAG server which is found a the following repository:
https://github.com/johnshearing/deep_avatar
The repostitory linked above is used to create question and answer pairs which are used to train LLMs to emulate a human model.
See _Notes.txt for usage.

.

About

Scrape YouTube. Make transcripts. Collect metadata. Prepare LLM Training Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages