Extract audio from a video and transcribe with Whisper, indexing with Llma-index, and summarize with GPT-4.
python3 ./transcriptin.py -f ./sample.mp4
Then, transcripted documents will be in ./data/documents
python3 ./transcriptin.py -i
Then, index data will be in ./data/indexes/index.json
python3 ./transcriptin.py
Input query: <INPUT_YOUR_QUERY_ABOUT_TRANSCRIPTED_TEXT>
We can get a streaming answer like the ChatGPT.
==========
Query:
<QUERY_YOU_INPUTED>
Answer:
<ANSWER_FROM_AI>
==========
node.node.id_='876f8bdb-xxxx-xxxx-xxxx-xxxxxxxxxxxx', node.score=0.8484xxxxxxxxxxxxxx
----------
Cosine Similarity:
0.84xxxxxxxxxxxxxx
Reference text:
<THE_PART_AI_REFERRED_TO>
Input query: exit
- Python 3.10 or higher.
To create a venv environment and activate:
python3 -m venv .venv
source .venv/bin/activate
To deactivate:
deactivate
pip3 install --upgrade pip
pip3 install -r requirements.txt
The main libraries installed are as follows:
pip freeze | grep -e "openai" -e "pydub" -e "llama-index" -e "sentence_transformers" -e "tiktoken"
llama-index==0.8.12
openai==0.27.9
pydub==0.25.1
tiktoken==0.4.0
Set your API Key to environment variables or shell dotfile like '.zshenv':
export OPENAI_API_KEY= 'YOUR_OPENAI_API_KEY'