The ArXiv LLM Research Assistant project aims to create an intelligent system capable of scraping recent research papers on Language Models (LLMs) from ArXiv, embedding the papers, and storing them in a vector database. This setup allows the system to rank and answer any LLM-related questions using up-to-date information from the latest research.
Before you begin, ensure you have the following installed on your machine:
- Ollama: A powerful language model processing tool.
- Python and pip: You can download
Python
from here and installpip
by following the instructions here.
-
Clone the repository:
git clone https://github.com/yourusername/RAG-LangChain-supreme-bot.git cd RAG-LangChain-supreme-bot
-
Install the required Python packages:
pip install -r requirements.txt
-
Download and install Ollama:
Follow the instructions here to download and install Ollama on your machine. ##
-
To parse the data, run the following script:
sh parse_data/command/parse_data.sh
-
To embed the parsed data and save the documents, run:
sh rag_core/command/save_docs.sh
-
To interact with the bot and ask questions, use:
sh chat_core/command/chat_core.sh
You can change the parameters in chat_core.sh
to adjust the number of user and question threads. Open chat_core.sh
in a text editor and modify the following lines:
# chane user id to have different chat memory
user_id=2
# create new conversation
conversation_id=5
To get different papers on various topics, update the search_query
parameter in parse_data/command/parse_data.sh
, please refer to ArXiv API for further information:
# Update this line with your desired search query
search_query="your_custom_search_query"
Adjust the numbers according to your requirements.
To view the contents of the database, you can use an MySQLWorkbench application. Open your preferred SQL tool and connect it to the database file located at chat_core/database/memory.db
. This will allow you to explore and query the database content.
We welcome contributions! If you have suggestions or improvements, please open an issue or submit a pull request.