IncarnaMind enables you to chat with your personal documents π (PDF, TXT) using Large Language Models (LLMs) like GPT (architecture overview). While OpenAI has recently launched a fine-tuning API for GPT models, it doesn't enable the base pretrained models to learn new data, and the responses can be prone to factual hallucinations. Utilize our Sliding Window Chunking mechanism and Ensemble Retriever enables efficient querying of both fine-grained and coarse-grained information within your ground truth documents to augment the LLMs.
Feel free to use it and we welcome any feedback and new feature suggestions π.
- Recommended Model: We've primarily tested with the Llama2 series models and recommend using llama2-70b-chat (either full or GGUF version) for optimal performance. Feel free to experiment with other LLMs.
- System Requirements: It requires more than 35GB of GPU RAM to run the GGUF quantized version.
- Insufficient RAM: If you're limited by GPU RAM, consider using the Together.ai API. It supports llama2-70b-chat and most other open-source LLMs. Plus, you get $25 in free usage.
- Upcoming: Smaller and cost-effecitive, fine-tuned models will be released in the future.
- For instructions on acquiring and using quantized GGUF LLM (similar to GGML), please refer to this video (from 10:45 to 12:30)..
Here is a comparison table of the different models I tested, for reference only:
Metrics | GPT-4 | GPT-3.5 | Claude 2.0 | Llama2-70b | Llama2-70b-gguf | Llama2-70b-api |
---|---|---|---|---|---|---|
Reasoning | High | Medium | High | Medium | Medium | Medium |
Speed | Medium | High | Medium | Very Low | Low | Medium |
GPU RAM | N/A | N/A | N/A | Very High | High | N/A |
Safety | Low | Low | Low | High | High | Low |
Demo.mp4
-
Fixed Chunking: Traditional RAG tools rely on fixed chunk sizes, limiting their adaptability in handling varying data complexity and context.
-
Precision vs. Semantics: Current retrieval methods usually focus either on semantic understanding or precise retrieval, but rarely both.
-
Single-Document Limitation: Many solutions can only query one document at a time, restricting multi-document information retrieval.
-
Stability: IncarnaMind is compatible with OpenAI GPT, Anthropic Claude, Llama2, and other open-source LLMs, ensuring stable parsing.
-
Adaptive Chunking: Our Sliding Window Chunking technique dynamically adjusts window size and position for RAG, balancing fine-grained and coarse-grained data access based on data complexity and context.
-
Multi-Document Conversational QA: Supports simple and multi-hop queries across multiple documents simultaneously, breaking the single-document limitation.
-
File Compatibility: Supports both PDF and TXT file formats.
-
LLM Model Compatibility: Supports OpenAI GPT, Anthropic Claude, Llama2 and other open-source LLMs.
The installation is simple, you just need to run few commands.
- 3.8 β€ Python < 3.11 with Conda
- One/All of OpenAI API Key, Anthropic Claude API Key, Together.ai API KEY or HuggingFace toekn for Meta Llama models
- And of course, your own documents.
git clone https://github.com/junruxiong/IncarnaMind
cd IncarnaMind
Create Conda virtual environment:
conda create -n IncarnaMind python=3.10
Activate:
conda activate IncarnaMind
Install all requirements:
pip install -r requirements.txt
Install llama-cpp seperatly if you want to run quantized local LLMs:
- ForΒ
NVIDIA
Β GPUs support, useΒcuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
- For Apple Metal (
M1/M2
) support, use
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
Setup your one/all of API keys in configparser.ini file:
[tokens]
OPENAI_API_KEY = (replace_me)
ANTHROPIC_API_KEY = (replace_me)
TOGETHER_API_KEY = (replace_me)
# if you use full Meta-Llama models, you may need Huggingface token to access.
HUGGINGFACE_TOKEN = (replace_me)
(Optional) Setup your custom parameters in configparser.ini file:
[parameters]
PARAMETERS 1 = (replace_me)
PARAMETERS 2 = (replace_me)
...
PARAMETERS n = (replace_me)
Put all your files (please name each file correctly to maximize the performance) into the /data directory and run the following command to ingest all data: (You can delete example files in the /data directory before running the command)
python docs2db.py
In order to start the conversation, run a command like:
python main.py
Wait for the script to require your input like the below.
Human:
When you start a chat, the system will automatically generate a IncarnaMind.log file. If you want to edit the logging, please edit in the configparser.ini file.
[logging]
enabled = True
level = INFO
filename = IncarnaMind.log
format = %(asctime)s [%(levelname)s] %(name)s: %(message)s
- Citation is not supported for current version, but will release soon.
- Limited asynchronous capabilities.
- Frontend UI interface
- Fine-tuned small size open-source LLMs
- OCR support
- Asynchronous optimization
- Support more document formats
Special thanks to Langchain, Chroma DB, LocalGPT, Llama-cpp for their invaluable contributions to the open-source community. Their work has been instrumental in making the IncarnaMind project a reality.
If you want to cite our work, please use the following bibtex entry:
@misc{IncarnaMind2023,
author = {Junru Xiong},
title = {IncarnaMind},
year = {2023},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/junruxiong/IncarnaMind}}
}