neo4j-labs · aashipandya · Dec 11, 2024 · Oct 28, 2024 · Nov 12, 2024 · Dec 4, 2024
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ Upload your files from local machine, GCS or S3 bucket or from web sources, choo
 - **Knowledge Graph Creation**: Transform unstructured data into structured knowledge graphs using LLMs.
 - **Providing Schema**: Provide your own custom schema or use existing schema in settings to generate graph.
 - **View Graph**: View graph for a particular source or multiple sources at a time in Bloom.
-- **Chat with Data**: Interact with your data in a Neo4j database through conversational queries, also retrive metadata about the source of response to your queries. 
+- **Chat with Data**: Interact with your data in a Neo4j database through conversational queries, also retrieve metadata about the source of response to your queries.For a dedicated chat interface, access the standalone chat application at: [Chat-Only](https://dev-frontend-dcavk67s4a-uc.a.run.app/chat-only). This link provides a focused chat experience for querying your data.
 
 ## Getting started
 
@@ -36,22 +36,23 @@ EX:
 ```env
 VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
 ```
-
-In your root folder, create a .env file with your OPENAI and DIFFBOT keys (if you want to use both):
+According to the environment, we are configuring the models which indicated by VITE_LLM_MODELS_PROD variable we can configure models based on our needs.
+EX:
 ```env
-OPENAI_API_KEY="your-openai-key"
-DIFFBOT_API_KEY="your-diffbot-key"
+VITE_LLM_MODELS_PROD="openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash"
 ```
 
 if you only want OpenAI:
 ```env
 VITE_LLM_MODELS_PROD="diffbot,openai-gpt-3.5,openai-gpt-4o"
+VITE_LLM_MODELS_PROD="diffbot,openai-gpt-3.5,openai-gpt-4o"
 OPENAI_API_KEY="your-openai-key"
 ```
 
 if you only want Diffbot:
 ```env
 VITE_LLM_MODELS_PROD="diffbot"
+VITE_LLM_MODELS_PROD="diffbot"
 DIFFBOT_API_KEY="your-diffbot-key"
 ```
 
@@ -77,6 +78,7 @@ You can of course combine all (local, youtube, wikipedia, s3 and gcs) or remove
 
 ### Chat Modes
 
+By default,all of the chat modes will be available: vector, graph_vector, graph, fulltext, graph_vector_fulltext , entity_vector and global_vector.
 By default,all of the chat modes will be available: vector, graph_vector, graph, fulltext, graph_vector_fulltext , entity_vector and global_vector.
 If none of the mode is mentioned in the chat modes variable all modes will be available:
 ```env
@@ -86,6 +88,7 @@ VITE_CHAT_MODES=""
 If however you want to specify the only vector mode or only graph mode you can do that by specifying the mode in the env:
 ```env
 VITE_CHAT_MODES="vector,graph"
+VITE_CHAT_MODES="vector,graph"
 ```
 
 #### Running Backend and Frontend separately (dev environment)
@@ -102,9 +105,13 @@ Alternatively, you can run the backend and frontend separately:
     ```
 
 - For the backend:
-1. Create the backend/.env file by copy/pasting the backend/example.env.
-2. Change values as needed
-3.
+1. Create the backend/.env file by copy/pasting the backend/example.env. To streamline the initial setup and testing of the application, you can preconfigure user credentials directly within the .env file. This bypasses the login dialog and allows you to immediately connect with a predefined user.
+   - **NEO4J_URI**:
+   - **NEO4J_USERNAME**:
+   - **NEO4J_PASSWORD**:
+   - **NEO4J_DATABASE**:
+3. Change values as needed
+4.
     ```bash
     cd backend
     python -m venv envName
@@ -155,12 +162,23 @@ Allow unauthenticated request : Yes
 | VITE_CHUNK_SIZE              | Optional           | 5242880       | Size of each chunk of file for upload                                                                |
 | VITE_GOOGLE_CLIENT_ID        | Optional           |               | Client ID for Google authentication                                                              |
 | VITE_LLM_MODELS_PROD         | Optional      | openai_gpt_4o,openai_gpt_4o_mini,diffbot,gemini_1.5_flash | To Distinguish models based on the Enviornment PROD or DEV 
+| VITE_LLM_MODELS              | Optional | 'diffbot,openai_gpt_3.5,openai_gpt_4o,openai_gpt_4o_mini,gemini_1.5_pro,gemini_1.5_flash,azure_ai_gpt_35,azure_ai_gpt_4o,ollama_llama3,groq_llama3_70b,anthropic_claude_3_5_sonnet' | Supported Models For the application
 | GCS_FILE_CACHE          | Optional           | False         | If set to True, will save the files to process into GCS. If set to False, will save the files locally   |
 | ENTITY_EMBEDDING        | Optional           | False         | If set to True, It will add embeddings for each entity in database |
 | LLM_MODEL_CONFIG_ollama_<model_name>         | Optional      |               | Set ollama config as - model_name,model_local_url for local deployments |
 | RAGAS_EMBEDDING_MODEL         | Optional      | openai              | embedding model used by ragas evaluation framework                               |
 
-
+## LLMs Supported 
+1. OpenAI
+2. Gemini
+3. Azure OpenAI(dev)
+4. Anthropic(dev)
+5. Fireworks(dev)
+6. Groq(dev)
+7. Amazon Bedrock(dev)
+8. Ollama(dev)
+9. Diffbot
+10. Other OpenAI compabtile baseurl models(dev)
 
 ## For local llms (Ollama)
 1. Pull the docker imgage of ollama
@@ -175,7 +193,7 @@ docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
 ```bash
 docker exec -it ollama ollama run llama3
 ```
-4. Configure  env variable in docker compose or backend enviournment.
+4. Configure  env variable in docker compose or backend environment.
 ```env
 LLM_MODEL_CONFIG_ollama_<model_name>
 #example
@@ -191,13 +209,14 @@ VITE_BACKEND_API_URL=${VITE_BACKEND_API_URL-backendurl}
 
 
 ## Usage
-1. Connect to Neo4j Aura Instance by passing URI and password or using Neo4j credentials file.
-2. Choose your source from a list of Unstructured sources to create graph.
-3. Change the LLM (if required) from drop down, which will be used to generate graph.
-4. Optionally, define schema(nodes and relationship labels) in entity graph extraction settings.
-5. Either select multiple files to 'Generate Graph' or all the files in 'New' status will be processed for graph creation.
-6. Have a look at the graph for individial files using 'View' in grid or select one or more files and 'Preview Graph'
-7. Ask questions related to the processed/completed sources to chat-bot, Also get detailed information about your answers generated by LLM.
+1. Connect to Neo4j Aura Instance which can be both AURA DS or AURA DB by passing URI and password through Backend env, fill using login dialog or drag and drop the Neo4j credentials file.
+2. To differntiate we have added different icons. For AURA DB we have a database icon and for AURA DS we have scientific molecule icon right under Neo4j Connection details label.
+3. Choose your source from a list of Unstructured sources to create graph.
+4. Change the LLM (if required) from drop down, which will be used to generate graph.
+5. Optionally, define schema(nodes and relationship labels) in entity graph extraction settings.
+6. Either select multiple files to 'Generate Graph' or all the files in 'New' status will be processed for graph creation.
+7. Have a look at the graph for individual files using 'View' in grid or select one or more files and 'Preview Graph'
+8. Ask questions related to the processed/completed sources to chat-bot, Also get detailed information about your answers generated by LLM.
 
 ## Links
 

diff --git a/backend/Dockerfile b/backend/Dockerfile
@@ -5,6 +5,7 @@ EXPOSE 8000
 # Install dependencies and clean up in one layer
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
+       libmagic1 \
        libgl1-mesa-glx \
        libreoffice \
        cmake \

diff --git a/backend/README.md b/backend/README.md
@@ -50,11 +50,11 @@ http://127.0.0.1:8000/redocs for ReDoc.
 
 ## Configuration
 
-Update the environment variable in `.env` file.
+Update the environment variable in `.env` file. Refer example.env in backend folder for more config.
 
-`OPENAI_API_KEY`: Open AI key to use LLM
+`OPENAI_API_KEY`: Open AI key to use incase of openai embeddings
 
-`DIFFBOT_API_KEY` : Diffbot API key to use DiffbotGraphTransformer
+`EMBEDDING_MODEL` : "all-MiniLM-L6-v2" or "openai" or "vertexai"
 
 `NEO4J_URI` : Neo4j URL
 

diff --git a/backend/example.env b/backend/example.env
@@ -1,6 +1,5 @@
 OPENAI_API_KEY = ""
-DIFFBOT_API_KEY = ""
-GROQ_API_KEY = ""
+#EMBEDDING_MODEL can be openai or vertexai or by default all-MiniLM-L6-v2
 EMBEDDING_MODEL = "all-MiniLM-L6-v2"
 RAGAS_EMBEDDING_MODEL = "openai"
 IS_EMBEDDING = "true"
@@ -31,6 +30,7 @@ DUPLICATE_TEXT_DISTANCE = ""
 #examples
 LLM_MODEL_CONFIG_openai_gpt_3.5="gpt-3.5-turbo-0125,openai_api_key"
 LLM_MODEL_CONFIG_openai_gpt_4o_mini="gpt-4o-mini-2024-07-18,openai_api_key"
+LLM_MODEL_CONFIG_openai_gpt_4o="gpt-4o-2024-11-20,openai_api_key"
 LLM_MODEL_CONFIG_gemini_1.5_pro="gemini-1.5-pro-002"
 LLM_MODEL_CONFIG_gemini_1.5_flash="gemini-1.5-flash-002"
 LLM_MODEL_CONFIG_diffbot="diffbot,diffbot_api_key"

diff --git a/backend/requirements.txt b/backend/requirements.txt
@@ -1,183 +1,61 @@
-aiohttp==3.9.3
-aiosignal==1.3.1
-annotated-types==0.6.0
-antlr4-python3-runtime==4.9.3
-anyio==4.3.0
-async-timeout==4.0.3
 asyncio==3.4.3
-attrs==23.2.0
-backoff==2.2.1
-beautifulsoup4==4.12.3
-boto3==1.34.140
-botocore==1.34.140
-cachetools==5.3.3
-certifi==2024.2.2
-cffi==1.16.0
-chardet==5.2.0
-charset-normalizer==3.3.2
-click==8.1.7
-coloredlogs==15.0.1
-contourpy==1.2.0
-cryptography==42.0.2
-cycler==0.12.1
-dataclasses-json==0.6.4
-dataclasses-json-speakeasy==0.5.11
-Deprecated==1.2.14
-distro==1.9.0
-docstring_parser==0.16
-effdet==0.4.1
-emoji==2.10.1
-exceptiongroup==1.2.0
-fastapi==0.111.0
+boto3==1.35.69
+botocore==1.35.69
+certifi==2024.8.30
+fastapi==0.115.6
 fastapi-health==0.4.0
-filelock==3.13.1
-filetype==1.2.0
-flatbuffers==23.5.26
-fonttools==4.49.0
-frozenlist==1.4.1
-fsspec==2024.2.0
-google-api-core==2.18.0
-google-auth==2.29.0
-google_auth_oauthlib==1.2.0
-google-cloud-aiplatform==1.58.0
-google-cloud-bigquery==3.19.0
+google-api-core==2.23.0
+google-auth==2.36.0
+google_auth_oauthlib==1.2.1
 google-cloud-core==2.4.1
-google-cloud-resource-manager==1.12.3
-google-cloud-storage==2.17.0
-google-crc32c==1.5.0
-google-resumable-media==2.7.0
-googleapis-common-protos==1.63.0
-greenlet==3.0.3
-grpc-google-iam-v1==0.13.0
-grpcio==1.62.1
-google-ai-generativelanguage==0.6.6
-grpcio-status==1.62.1
-h11==0.14.0
-httpcore==1.0.4
-httpx==0.27.0
-huggingface-hub
-humanfriendly==10.0
-idna==3.6
-importlib-resources==6.1.1
+json-repair==0.30.2
 pip-install==1.3.5
-iopath==0.1.10
-Jinja2==3.1.3
-jmespath==1.0.1
-joblib==1.3.2
-jsonpatch==1.33
-jsonpath-python==1.0.6
-jsonpointer==2.4
-json-repair==0.25.2
-kiwisolver==1.4.5
-langchain==0.3.0
-langchain-aws==0.2.1
-langchain-anthropic==0.2.1
-langchain-fireworks==0.2.0
-langchain-google-genai==2.0.0
-langchain-community==0.3.0
-langchain-core==0.3.5
-langchain-experimental==0.3.1
-langchain-google-vertexai==2.0.1
-langchain-groq==0.2.0
-langchain-openai==0.2.0
-langchain-text-splitters==0.3.0
+langchain==0.3.8
+langchain-aws==0.2.7
+langchain-anthropic==0.3.0
+langchain-fireworks==0.2.5
+langchain-community==0.3.8
+langchain-core==0.3.21
+langchain-experimental==0.3.3
+langchain-google-vertexai==2.0.7
+langchain-groq==0.2.1
+langchain-openai==0.2.9
+langchain-text-splitters==0.3.2
+langchain-huggingface==0.1.2
 langdetect==1.0.9
-langsmith==0.1.128
-layoutparser==0.3.4
+langsmith==0.1.146
 langserve==0.3.0
-#langchain-cli==0.0.25
-lxml==5.1.0
-MarkupSafe==2.1.5
-marshmallow==3.20.2
-matplotlib==3.7.2
-mpmath==1.3.0
-multidict==6.0.5
-mypy-extensions==1.0.0
 neo4j-rust-ext
-networkx==3.2.1
-nltk==3.8.1
-numpy==1.26.4
-omegaconf==2.3.0
-onnx==1.16.1
-onnxruntime==1.18.1
-openai==1.47.1
-opencv-python==4.8.0.76
-orjson==3.9.15
-packaging==23.2
-pandas==2.2.0
-pdf2image==1.17.0
-pdfminer.six==20221105
-pdfplumber==0.10.4
-pikepdf==8.11.0
-pillow==10.2.0
-pillow_heif==0.15.0
-portalocker==2.8.2
-proto-plus==1.23.0
-protobuf==4.23.4
-psutil==6.0.0
-pyasn1==0.6.0
-pyasn1_modules==0.4.0
-pycocotools==2.0.7
-pycparser==2.21
-pydantic==2.8.2
-pydantic_core==2.20.1
-pyparsing==3.0.9
-pypdf==4.0.1
-PyPDF2==3.0.1
-pypdfium2==4.27.0
-pytesseract==0.3.10
-python-dateutil==2.8.2
+nltk==3.9.1
+openai==1.55.1
+opencv-python==4.10.0.84
+psutil==6.1.0
+pydantic==2.9.0
 python-dotenv==1.0.1
-python-iso639==2024.2.7
 python-magic==0.4.27
-python-multipart==0.0.9
-pytube==15.0.0
-pytz==2024.1
-PyYAML==6.0.1
-rapidfuzz==3.6.1
-regex==2023.12.25
-requests==2.32.3
-rsa==4.9
-s3transfer==0.10.1
-safetensors==0.4.1
-shapely==2.0.3
-six==1.16.0
-sniffio==1.3.1
-soupsieve==2.5
-starlette==0.37.2
-sse-starlette==2.1.2
+PyPDF2==3.0.1
+PyMuPDF==1.24.14
+starlette==0.41.3
+sse-starlette==2.1.3
 starlette-session==0.4.3
-sympy==1.12
-tabulate==0.9.0
-tenacity==8.2.3
-tiktoken==0.7.0
-timm==0.9.12
-tokenizers==0.19
-tqdm==4.66.2
-transformers==4.42.3
-types-protobuf
-types-requests
-typing-inspect==0.9.0
-typing_extensions==4.12.2
-tzdata==2024.1
-unstructured==0.14.9
-unstructured-client==0.23.8
-unstructured-inference==0.7.36
-unstructured.pytesseract==0.3.12
-unstructured[all-docs]==0.14.9
+tqdm==4.67.1
+unstructured[all-docs]==0.16.6
+unstructured==0.16.6
+unstructured-client==0.26.2
+unstructured-inference==0.8.1
 urllib3==2.2.2
-uvicorn==0.30.1
-gunicorn==22.0.0
+uvicorn==0.32.1
+gunicorn==23.0.0
 wikipedia==1.4.0
 wrapt==1.16.0
 yarl==1.9.4
-youtube-transcript-api==0.6.2
+youtube-transcript-api==0.6.3
 zipp==3.17.0
-sentence-transformers==3.0.1
-google-cloud-logging==3.10.0
-PyMuPDF==1.24.5
+sentence-transformers==3.3.1
+google-cloud-logging==3.11.3
 pypandoc==1.13
-graphdatascience==1.10
+graphdatascience==1.12
 Secweb==1.11.0
-ragas==0.1.14
-
+ragas==0.2.6
+rouge_score==0.1.2
+langchain-neo4j==0.1.1