Retrieval Augmented Generation (RAG): LLM-powered Q&A Chatbot to chat with your documents

Extend an LLM-powered Chatbot, retrieving and reranking a semantic search, to chat with a PDF file using Langchain and Pinecone vector database

Description

We want to extract information from private data in the form of a PDF file. A chatbot interface is a user-friendly approach and we need to apply a memory mechanism to keep track of the conversation and optimize how we interact with the Chatbot. To handle the amount of information that a PDF file can contain, we load all that information in a vector database, including the embedding vectors that compress that information.

Every time a new question is inserted, we call an embedding model to transform it to a vector and, after that, we send the query to the vector db engine, and using a semantic search the top k most similar pieces of text are returned. All that context is sent to the LLM, ChatGPT, to get an accurate response.

Chatbot App

Streamlit is a very helpful tool for building a simple demo app for many machine learning tasks. It is a simple app to show how this model works. Robustness and eficiency is not the goal of this app.

We have uploaded the app to Streamlit Community Cloud to share it with the community.

Content

The source code provides method and functions to work with Pinecone and Deeplake vector databases, but the final app is built on top of Deeplake.

In future realeses we would try to select the database to use and other configuration options.

main.py: The Streamlit app
utils: source code to handle vector databases
core: python scripts to run questions or conversational actions
constant.py: Dome config parameters for the database connection.
.py files: there is a .py file for every stage of the process: ingestion, retrival, conversational and chat.

Contributing

If you find some bug or typo, please let me know or fixit and push it to be analyzed.

License

These notebooks are under a Apache 2.0 license.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
backend		backend
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
botconversational.py		botconversational.py
constants.py		constants.py
conversational.py		conversational.py
ingestion.py		ingestion.py
main.py		main.py
pdfloader.py		pdfloader.py
requirements.txt		requirements.txt
retrieval.py		retrieval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Retrieval Augmented Generation (RAG): LLM-powered Q&A Chatbot to chat with your documents

Extend an LLM-powered Chatbot, retrieving and reranking a semantic search, to chat with a PDF file using Langchain and Pinecone vector database

Description

Chatbot App

Content

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

edumunozsala/documentation-bot-helper-langchain

Folders and files

Latest commit

History

Repository files navigation

Retrieval Augmented Generation (RAG): LLM-powered Q&A Chatbot to chat with your documents

Extend an LLM-powered Chatbot, retrieving and reranking a semantic search, to chat with a PDF file using Langchain and Pinecone vector database

Description

Chatbot App

Content

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages