Scrape a YouTube channel for audio.
Create a transcript with punctuation, diarization, timestamps, and metadata.
The transcripts are ingested by the LightRAG server which is found a the following repository:
https://github.com/johnshearing/deep_avatar
The repostitory linked above is used to create question and answer pairs which are used to train LLMs to emulate a human model.
See _Notes.txt for usage.
.