Skip to content

Hybridrag with struct2graph #2014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
20 changes: 18 additions & 2 deletions HybridRAG/docker_compose/intel/hpu/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,15 @@ docker compose up -d
The HybridRAG docker images should automatically be downloaded from the `OPEA registry` and deployed on the Intel® Gaudi® Platform:

```
[+] Running 9/9
[+] Running 10/10
✔ Container redis-vector-db Healthy 6.4s
✔ Container vllm-service Started 0.4s
✔ Container tei-embedding-server Started 0.9s
✔ Container neo4j-apoc Healthy 11.4s
✔ Container tei-reranking-server Started 0.8s
✔ Container retriever-redis-server Started 1.0s
✔ Container dataprep-redis-server Started 6.5s
✔ Container struct2graph Started 10.5s
✔ Container text2cypher-gaudi-container Started 12.2s
✔ Container hybridrag-xeon-backend-server Started 12.4s
```
Expand All @@ -80,6 +81,7 @@ CONTAINER ID IMAGE
a9286abd0015 opea/hybridrag:latest "python hybridrag.py" 15 hours ago Up 15 hours 0.0.0.0:8888->8888/tcp, :::8888->8888/tcp hybridrag-xeon-backend-server
8477b154dc72 opea/text2cypher-gaudi:latest "/bin/sh -c 'bash ru…" 15 hours ago Up 15 hours 0.0.0.0:11801->9097/tcp, [::]:11801->9097/tcp text2cypher-gaudi-container
688e01a431fa opea/dataprep:latest "sh -c 'python $( [ …" 15 hours ago Up 15 hours 0.0.0.0:6007->5000/tcp, [::]:6007->5000/tcp dataprep-redis-server
e3b1c44298f6 opea/struct2graph:latest "sh -c 'python $( [ …" 15 hours ago Up 15 hours 0.0.0.0:8090->8090/tcp, [::]:8090->8090/tcp struct2graph
54f574fe54bb opea/retriever:latest "python opea_retriev…" 15 hours ago Up 15 hours 0.0.0.0:7000->7000/tcp, :::7000->7000/tcp retriever-redis-server
5028eb66617c ghcr.io/huggingface/text-embeddings-inference:cpu-1.6 "text-embeddings-rou…" 15 hours ago Up 15 hours 0.0.0.0:8808->80/tcp, [::]:8808->80/tcp tei-reranking-server
a9dbf8a13365 opea/vllm:latest "python3 -m vllm.ent…" 15 hours ago Up 15 hours (healthy) 0.0.0.0:9009->80/tcp, [::]:9009->80/tcp vllm-service
Expand All @@ -93,7 +95,7 @@ a9dbf8a13365 opea/vllm:latest
Once the HybridRAG services are running, run data ingestion. The following command is ingesting unstructure data:

```bash
cd GenAIExamples/HybridRAG/tests
cd GenAIExamples/HybridRAG/tests/data
curl -X POST -H "Content-Type: multipart/form-data" \
-F "files=@./Diabetes.txt" \
-F "files=@./Acne_Vulgaris.txt" \
Expand Down Expand Up @@ -130,6 +132,20 @@ If the graph database is already populated, you can skip the knowledge graph gen
export refresh_db='False'
```

Alternatively, you can also use the struct2graph microservice for ingesting structured data (json, csv).

```bash
cd GenAIExamples/HybridRAG/tests/data
curl -X POST http://${host_ip}:8090/v1/struct2graph \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"input_text": "",
"task": "Index",
"cypher_cmd": "LOAD CSV WITH HEADERS FROM 'file:///diseases.csv' AS row CREATE (:DiseaseInfo {Disease: row.Disease, Medications: row.Medications, Treatments: row.Treatments, HomeRemedies: row.HomeRemedies, Symptoms: row.Symptoms})"
}'
```

Now test the pipeline using the following command:

```bash
Expand Down
37 changes: 37 additions & 0 deletions HybridRAG/docker_compose/intel/hpu/gaudi/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ services:
- ./data/neo4j/config:/config
- ./data/neo4j/data:/data
- ./data/neo4j/plugins:/plugins
- ./data:/var/lib/neo4j/import
ipc: host
environment:
- no_proxy=${no_proxy}
Expand All @@ -61,6 +62,40 @@ services:
timeout: 10s
retries: 20
start_period: 3s
struct2graph:
image: ${REGISTRY:-opea}/struct2graph:${TAG:-latest}
container_name: struct2graph
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- NEO4J_URI=${NEO4J_URI}
- NEO4J_URL=${NEO4J_URI}
- NEO4J_USERNAME=${NEO4J_USERNAME}
- NEO4J_PASSWORD=${NEO4J_PASSWORD}
- NEO4J_server_directories_import=import
- NEO4J_PLUGINS=["apoc"]
- NEO4J_dbms_security_allow__csv__import__from__file__urls=true
- NEO4J_server_directories_import='/var/lib/neo4j/import'
- NEO4J_dbms_security_procedures_unrestricted=apoc.\\\* neo4j:5.23.0
- STRUCT2GRAPH_PORT=${STRUCT2GRAPH_PORT:-8090}
- NEO4J_PORT1=${NEO4J_PORT1}
- NEO4J_PORT2=${NEO4J_PORT2}
- INDEX_NAME=${INDEX_NAME:-graph_store}
- LOAD_FORMAT=${LOAD_FORMAT:-CSV}
ports:
- "8090:8090"
depends_on:
neo4j-apoc:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:7474"]
interval: 10s
timeout: 5s
retries: 10
start_period: 30s
ipc: host
restart: always
redis-vector-db:
image: redis/redis-stack:7.2.0-v9
container_name: redis-vector-db
Expand Down Expand Up @@ -232,6 +267,8 @@ services:
- BACKEND_SERVICE_PORT=8888
- DATAPREP_SERVICE_IP=dataprep-redis-service
- DATAPREP_SERVICE_PORT=5000
- STRUCT2GRAPH_SERVICE_IP=struct2graph
- STRUCT2GRAPH_SERVICE_PORT=8090
ipc: host
restart: always

Expand Down
4 changes: 3 additions & 1 deletion HybridRAG/docker_compose/intel/hpu/gaudi/set_env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ export JAEGER_IP=$(ip route get 8.8.8.8 | grep -oP 'src \K[^ ]+')
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
export TELEMETRY_ENDPOINT=http://$JAEGER_IP:4318/v1/traces
# Set no proxy
export no_proxy="$no_proxy,hybridrag-gaudi-ui-server,hybridrag-gaudi-backend-server,dataprep-redis-service,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,jaeger,prometheus,grafana,node-exporter,localhost,127.0.0.1,$JAEGER_IP,${host_ip}"
export no_proxy="$no_proxy,hybridrag-gaudi-ui-server,hybridrag-gaudi-backend-server,dataprep-redis-service,struct2graph,tei-embedding-service,retriever,tei-reranking-service,tgi-service,vllm-service,jaeger,prometheus,grafana,node-exporter,localhost,127.0.0.1,$JAEGER_IP,${host_ip}"


export MEGA_SERVICE_HOST_IP=${host_ip}
Expand All @@ -36,6 +36,7 @@ export RERANK_SERVER_PORT=8808
export LLM_SERVER_PORT=9009
export TEXT2CYPHER_SERVER_PORT=11801
export REDIS_SERVER_PORT=6379
export STRUCT2GRAPH_PORT=8090

export LLM_ENDPOINT_PORT=8010
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
Expand All @@ -53,3 +54,4 @@ export NEO4J_URL="bolt://${host_ip}:${NEO4J_PORT2}"
export NEO4J_USERNAME="neo4j"
export NEO4J_PASSWORD="neo4jtest"
export LOGFLAG=True
export LOAD_FORMAT=${LOAD_FORMAT:-"CSV"}
6 changes: 6 additions & 0 deletions HybridRAG/docker_image_build/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,12 @@ services:
dockerfile: comps/dataprep/src/Dockerfile
extends: hybridrag
image: ${REGISTRY:-opea}/dataprep:${TAG:-latest}
struct2graph:
build:
context: GenAIComps
dockerfile: comps/struct2graph/src/Dockerfile
extends: hybridrag
image: ${REGISTRY:-opea}/struct2graph:${TAG:-latest}
retriever:
build:
context: GenAIComps
Expand Down
Loading
Loading