llamaindex-with-whisper

Extract audio from a video and transcribe with Whisper, indexing with Llma-index, and summarize with GPT-4.

Usage

Extract audio from the video file and transcribe

python3 ./transcriptin.py -f ./sample.mp4

Then, transcripted documents will be in ./data/documents

Vectorize the transcribed data

python3 ./transcriptin.py -i

Then, index data will be in ./data/indexes/index.json

Execute query

python3 ./transcriptin.py

Input query: <INPUT_YOUR_QUERY_ABOUT_TRANSCRIPTED_TEXT>

Response

We can get a streaming answer like the ChatGPT.

==========
Query:
<QUERY_YOU_INPUTED>
Answer:
<ANSWER_FROM_AI>
==========

node.node.id_='876f8bdb-xxxx-xxxx-xxxx-xxxxxxxxxxxx', node.score=0.8484xxxxxxxxxxxxxx
----------

Cosine Similarity:
0.84xxxxxxxxxxxxxx

Reference text:
<THE_PART_AI_REFERRED_TO>

When you exit the console, input 'exit'.

Input query: exit

Setup

Recommended System Requirements

Python 3.10 or higher.

Setup venv environment

To create a venv environment and activate:

python3 -m venv .venv
source .venv/bin/activate

To deactivate:

deactivate

Setup Python environment

pip3 install --upgrade pip
pip3 install -r requirements.txt

The main libraries installed are as follows:

pip freeze | grep -e "openai" -e "pydub" -e "llama-index" -e "sentence_transformers" -e "tiktoken"

llama-index==0.8.12
openai==0.27.9
pydub==0.25.1
tiktoken==0.4.0

Requirement OpenAI API Key

Set your API Key to environment variables or shell dotfile like '.zshenv':

export OPENAI_API_KEY= 'YOUR_OPENAI_API_KEY'

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
construct_index.py		construct_index.py
qa_prompt_ja.txt		qa_prompt_ja.txt
requirements.txt		requirements.txt
transcriptin.py		transcriptin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llamaindex-with-whisper

Usage

Extract audio from the video file and transcribe

Vectorize the transcribed data

Execute query

Response

When you exit the console, input 'exit'.

Setup

Recommended System Requirements

Setup venv environment

Setup Python environment

Requirement OpenAI API Key

Reference

About

Releases

Packages

Languages

License

revsystem/llamaindex-with-whisper

Folders and files

Latest commit

History

Repository files navigation

llamaindex-with-whisper

Usage

Extract audio from the video file and transcribe

Vectorize the transcribed data

Execute query

Response

When you exit the console, input 'exit'.

Setup

Recommended System Requirements

Setup venv environment

Setup Python environment

Requirement OpenAI API Key

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages