Skip to content

v0.6 release #933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 36 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Upload your files from local machine, GCS or S3 bucket or from web sources, choo
- **Knowledge Graph Creation**: Transform unstructured data into structured knowledge graphs using LLMs.
- **Providing Schema**: Provide your own custom schema or use existing schema in settings to generate graph.
- **View Graph**: View graph for a particular source or multiple sources at a time in Bloom.
- **Chat with Data**: Interact with your data in a Neo4j database through conversational queries, also retrive metadata about the source of response to your queries.
- **Chat with Data**: Interact with your data in a Neo4j database through conversational queries, also retrieve metadata about the source of response to your queries.For a dedicated chat interface, access the standalone chat application at: [Chat-Only](https://dev-frontend-dcavk67s4a-uc.a.run.app/chat-only). This link provides a focused chat experience for querying your data.

## Getting started

Expand All @@ -36,22 +36,23 @@ EX:
```env
VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
```

In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both):
According to the environment, we are configuring the models which indicated by VITE_LLM_MODELS_PROD variable we can configure models based on our needs.
EX:
```env
OPENAI_API_KEY="your-openai-key"
DIFFBOT_API_KEY="your-diffbot-key"
VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
```

if you only want OpenAI:
```env
VITE_LLM_MODELS_PROD="diffbot,openai-gpt-3.5,openai-gpt-4o"
VITE_LLM_MODELS_PROD="diffbot,openai-gpt-3.5,openai-gpt-4o"
OPENAI_API_KEY="your-openai-key"
```

if you only want Diffbot:
```env
VITE_LLM_MODELS_PROD="diffbot"
VITE_LLM_MODELS_PROD="diffbot"
DIFFBOT_API_KEY="your-diffbot-key"
```

Expand All @@ -77,6 +78,7 @@ You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove

### Chat Modes

By default,all of the chat modes will be available: vector, graph_vector, graph, fulltext, graph_vector_fulltext , entity_vector and global_vector.
By default,all of the chat modes will be available: vector, graph_vector, graph, fulltext, graph_vector_fulltext , entity_vector and global_vector.
If none of the mode is mentioned in the chat modes variable all modes will be available:
```env
Expand All @@ -86,6 +88,7 @@ VITE_CHAT_MODES=""
If however you want to specify the only vector mode or only graph mode you can do that by specifying the mode in the env:
```env
VITE_CHAT_MODES="vector,graph"
VITE_CHAT_MODES="vector,graph"
```

#### Running Backend and Frontend separately (dev environment)
Expand All @@ -102,9 +105,13 @@ Alternatively, you can run the backend and frontend separately:
```

- For the backend:
1. Create the backend/.env file by copy/pasting the backend/example.env.
2. Change values as needed
3.
1. Create the backend/.env file by copy/pasting the backend/example.env. To streamline the initial setup and testing of the application, you can preconfigure user credentials directly within the .env file. This bypasses the login dialog and allows you to immediately connect with a predefined user.
- **NEO4J_URI**:
- **NEO4J_USERNAME**:
- **NEO4J_PASSWORD**:
- **NEO4J_DATABASE**:
3. Change values as needed
4.
```bash
cd backend
python -m venv envName
Expand Down Expand Up @@ -155,12 +162,23 @@ Allow unauthenticated request : Yes
| VITE_CHUNK_SIZE | Optional | 5242880 | Size of each chunk of file for upload |
| VITE_GOOGLE_CLIENT_ID | Optional | | Client ID for Google authentication |
| VITE_LLM_MODELS_PROD | Optional | openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash | To Distinguish models based on the Enviornment PROD or DEV
| VITE_LLM_MODELS | Optional | 'diffbot,openai_gpt_3.5,openai_gpt_4o,openai_gpt_4o_mini,gemini_1.5_pro,gemini_1.5_flash,azure_ai_gpt_35,azure_ai_gpt_4o,ollama_llama3,groq_llama3_70b,anthropic_claude_3_5_sonnet' | Supported Models For the application
| GCS_FILE_CACHE | Optional | False | If set to True, will save the files to process into GCS. If set to False, will save the files locally |
| ENTITY_EMBEDDING | Optional | False | If set to True, It will add embeddings for each entity in database |
| LLM_MODEL_CONFIG_ollama_<model_name> | Optional | | Set ollama config as - model_name,model_local_url for local deployments |
| RAGAS_EMBEDDING_MODEL | Optional | openai | embedding model used by ragas evaluation framework |


## LLMs Supported
1. OpenAI
2. Gemini
3. Azure OpenAI(dev)
4. Anthropic(dev)
5. Fireworks(dev)
6. Groq(dev)
7. Amazon Bedrock(dev)
8. Ollama(dev)
9. Diffbot
10. Other OpenAI compabtile baseurl models(dev)

## For local llms (Ollama)
1. Pull the docker imgage of ollama
Expand All @@ -175,7 +193,7 @@ docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```bash
docker exec -it ollama ollama run llama3
```
4. Configure env variable in docker compose or backend enviournment.
4. Configure env variable in docker compose or backend environment.
```env
LLM_MODEL_CONFIG_ollama_<model_name>
#example
Expand All @@ -191,13 +209,14 @@ VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl}


## Usage
1. Connect to Neo4j Aura Instance by passing URI and password or using Neo4j credentials file.
2. Choose your source from a list of Unstructured sources to create graph.
3. Change the LLM (if required) from drop down, which will be used to generate graph.
4. Optionally, define schema(nodes and relationship labels) in entity graph extraction settings.
5. Either select multiple files to 'Generate Graph' or all the files in 'New' status will be processed for graph creation.
6. Have a look at the graph for individial files using 'View' in grid or select one or more files and 'Preview Graph'
7. Ask questions related to the processed/completed sources to chat-bot, Also get detailed information about your answers generated by LLM.
1. Connect to Neo4j Aura Instance which can be both AURA DS or AURA DB by passing URI and password through Backend env, fill using login dialog or drag and drop the Neo4j credentials file.
2. To differntiate we have added different icons. For AURA DB we have a database icon and for AURA DS we have scientific molecule icon right under Neo4j Connection details label.
3. Choose your source from a list of Unstructured sources to create graph.
4. Change the LLM (if required) from drop down, which will be used to generate graph.
5. Optionally, define schema(nodes and relationship labels) in entity graph extraction settings.
6. Either select multiple files to 'Generate Graph' or all the files in 'New' status will be processed for graph creation.
7. Have a look at the graph for individual files using 'View' in grid or select one or more files and 'Preview Graph'
8. Ask questions related to the processed/completed sources to chat-bot, Also get detailed information about your answers generated by LLM.

## Links

Expand Down
1 change: 1 addition & 0 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ EXPOSE 8000
# Install dependencies and clean up in one layer
RUN apt-get update && \
apt-get install -y --no-install-recommends \
libmagic1 \
libgl1-mesa-glx \
libreoffice \
cmake \
Expand Down
6 changes: 3 additions & 3 deletions backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ http://127.0.0.1:8000/redocs for ReDoc.

## Configuration

Update the environment variable in `.env` file.
Update the environment variable in `.env` file. Refer example.env in backend folder for more config.

`OPENAI_API_KEY`: Open AI key to use LLM
`OPENAI_API_KEY`: Open AI key to use incase of openai embeddings

`DIFFBOT_API_KEY` : Diffbot API key to use DiffbotGraphTransformer
`EMBEDDING_MODEL` : "all-MiniLM-L6-v2" or "openai" or "vertexai"

`NEO4J_URI` : Neo4j URL

Expand Down
4 changes: 2 additions & 2 deletions backend/example.env
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
OPENAI_API_KEY = ""
DIFFBOT_API_KEY = ""
GROQ_API_KEY = ""
#EMBEDDING_MODEL can be openai or vertexai or by default all-MiniLM-L6-v2
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
RAGAS_EMBEDDING_MODEL = "openai"
IS_EMBEDDING = "true"
Expand Down Expand Up @@ -31,6 +30,7 @@ DUPLICATE_TEXT_DISTANCE = ""
#examples
LLM_MODEL_CONFIG_openai_gpt_3.5="gpt-3.5-turbo-0125,openai_api_key"
LLM_MODEL_CONFIG_openai_gpt_4o_mini="gpt-4o-mini-2024-07-18,openai_api_key"
LLM_MODEL_CONFIG_openai_gpt_4o="gpt-4o-2024-11-20,openai_api_key"
LLM_MODEL_CONFIG_gemini_1.5_pro="gemini-1.5-pro-002"
LLM_MODEL_CONFIG_gemini_1.5_flash="gemini-1.5-flash-002"
LLM_MODEL_CONFIG_diffbot="diffbot,diffbot_api_key"
Expand Down
210 changes: 44 additions & 166 deletions backend/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,183 +1,61 @@
aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
antlr4-python3-runtime==4.9.3
anyio==4.3.0
async-timeout==4.0.3
asyncio==3.4.3
attrs==23.2.0
backoff==2.2.1
beautifulsoup4==4.12.3
boto3==1.34.140
botocore==1.34.140
cachetools==5.3.3
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
coloredlogs==15.0.1
contourpy==1.2.0
cryptography==42.0.2
cycler==0.12.1
dataclasses-json==0.6.4
dataclasses-json-speakeasy==0.5.11
Deprecated==1.2.14
distro==1.9.0
docstring_parser==0.16
effdet==0.4.1
emoji==2.10.1
exceptiongroup==1.2.0
fastapi==0.111.0
boto3==1.35.69
botocore==1.35.69
certifi==2024.8.30
fastapi==0.115.6
fastapi-health==0.4.0
filelock==3.13.1
filetype==1.2.0
flatbuffers==23.5.26
fonttools==4.49.0
frozenlist==1.4.1
fsspec==2024.2.0
google-api-core==2.18.0
google-auth==2.29.0
google_auth_oauthlib==1.2.0
google-cloud-aiplatform==1.58.0
google-cloud-bigquery==3.19.0
google-api-core==2.23.0
google-auth==2.36.0
google_auth_oauthlib==1.2.1
google-cloud-core==2.4.1
google-cloud-resource-manager==1.12.3
google-cloud-storage==2.17.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
greenlet==3.0.3
grpc-google-iam-v1==0.13.0
grpcio==1.62.1
google-ai-generativelanguage==0.6.6
grpcio-status==1.62.1
h11==0.14.0
httpcore==1.0.4
httpx==0.27.0
huggingface-hub
humanfriendly==10.0
idna==3.6
importlib-resources==6.1.1
json-repair==0.30.2
pip-install==1.3.5
iopath==0.1.10
Jinja2==3.1.3
jmespath==1.0.1
joblib==1.3.2
jsonpatch==1.33
jsonpath-python==1.0.6
jsonpointer==2.4
json-repair==0.25.2
kiwisolver==1.4.5
langchain==0.3.0
langchain-aws==0.2.1
langchain-anthropic==0.2.1
langchain-fireworks==0.2.0
langchain-google-genai==2.0.0
langchain-community==0.3.0
langchain-core==0.3.5
langchain-experimental==0.3.1
langchain-google-vertexai==2.0.1
langchain-groq==0.2.0
langchain-openai==0.2.0
langchain-text-splitters==0.3.0
langchain==0.3.8
langchain-aws==0.2.7
langchain-anthropic==0.3.0
langchain-fireworks==0.2.5
langchain-community==0.3.8
langchain-core==0.3.21
langchain-experimental==0.3.3
langchain-google-vertexai==2.0.7
langchain-groq==0.2.1
langchain-openai==0.2.9
langchain-text-splitters==0.3.2
langchain-huggingface==0.1.2
langdetect==1.0.9
langsmith==0.1.128
layoutparser==0.3.4
langsmith==0.1.146
langserve==0.3.0
#langchain-cli==0.0.25
lxml==5.1.0
MarkupSafe==2.1.5
marshmallow==3.20.2
matplotlib==3.7.2
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
neo4j-rust-ext
networkx==3.2.1
nltk==3.8.1
numpy==1.26.4
omegaconf==2.3.0
onnx==1.16.1
onnxruntime==1.18.1
openai==1.47.1
opencv-python==4.8.0.76
orjson==3.9.15
packaging==23.2
pandas==2.2.0
pdf2image==1.17.0
pdfminer.six==20221105
pdfplumber==0.10.4
pikepdf==8.11.0
pillow==10.2.0
pillow_heif==0.15.0
portalocker==2.8.2
proto-plus==1.23.0
protobuf==4.23.4
psutil==6.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycocotools==2.0.7
pycparser==2.21
pydantic==2.8.2
pydantic_core==2.20.1
pyparsing==3.0.9
pypdf==4.0.1
PyPDF2==3.0.1
pypdfium2==4.27.0
pytesseract==0.3.10
python-dateutil==2.8.2
nltk==3.9.1
openai==1.55.1
opencv-python==4.10.0.84
psutil==6.1.0
pydantic==2.9.0
python-dotenv==1.0.1
python-iso639==2024.2.7
python-magic==0.4.27
python-multipart==0.0.9
pytube==15.0.0
pytz==2024.1
PyYAML==6.0.1
rapidfuzz==3.6.1
regex==2023.12.25
requests==2.32.3
rsa==4.9
s3transfer==0.10.1
safetensors==0.4.1
shapely==2.0.3
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
starlette==0.37.2
sse-starlette==2.1.2
PyPDF2==3.0.1
PyMuPDF==1.24.14
starlette==0.41.3
sse-starlette==2.1.3
starlette-session==0.4.3
sympy==1.12
tabulate==0.9.0
tenacity==8.2.3
tiktoken==0.7.0
timm==0.9.12
tokenizers==0.19
tqdm==4.66.2
transformers==4.42.3
types-protobuf
types-requests
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2024.1
unstructured==0.14.9
unstructured-client==0.23.8
unstructured-inference==0.7.36
unstructured.pytesseract==0.3.12
unstructured[all-docs]==0.14.9
tqdm==4.67.1
unstructured[all-docs]==0.16.6
unstructured==0.16.6
unstructured-client==0.26.2
unstructured-inference==0.8.1
urllib3==2.2.2
uvicorn==0.30.1
gunicorn==22.0.0
uvicorn==0.32.1
gunicorn==23.0.0
wikipedia==1.4.0
wrapt==1.16.0
yarl==1.9.4
youtube-transcript-api==0.6.2
youtube-transcript-api==0.6.3
zipp==3.17.0
sentence-transformers==3.0.1
google-cloud-logging==3.10.0
PyMuPDF==1.24.5
sentence-transformers==3.3.1
google-cloud-logging==3.11.3
pypandoc==1.13
graphdatascience==1.10
graphdatascience==1.12
Secweb==1.11.0
ragas==0.1.14

ragas==0.2.6
rouge_score==0.1.2
langchain-neo4j==0.1.1
Loading
Loading