Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 72 additions & 76 deletions examples/ChatQnA/deploy/xeon.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,6 @@ git clone https://github.com/opea-project/GenAIComps.git
git clone https://github.com/opea-project/GenAIExamples.git
```

Checkout the release tag
```
cd GenAIComps
git checkout tags/v1.0
```

The examples utilize model weights from HuggingFace and langchain.

Setup your [HuggingFace](https://huggingface.co/) account and generate
Expand All @@ -78,19 +72,33 @@ export https_proxy=${your_http_proxy}

## Prepare (Building / Pulling) Docker images

This step will involve building/pulling ( maybe in future) relevant docker
This step will involve building/pulling relevant docker
images with step-by-step process along with sanity check in the end. For
ChatQnA, the following docker images will be needed: embedding, retriever,
rerank, LLM and dataprep. Additionally, you will need to build docker images for
ChatQnA megaservice, and UI (conversational React UI is optional). In total,
there are 8 required and an optional docker images.

The docker images needed to setup the example needs to be build local, however
the images will be pushed to docker hub soon by Intel.

### Build/Pull Microservice images

From within the `GenAIComps` folder
::::::{tab-set}

:::::{tab-item} Pull
:sync: Pull

If you decide to pull the docker containers and not build them locally,
you can proceed to the next step where all the necessary containers will
be pulled in from dockerhub.

:::::
:::::{tab-item} Build
:sync: Build

From within the `GenAIComps` folder, checkout the release tag.
```
cd GenAIComps
git checkout tags/v1.0
```

#### Build Dataprep Image

Expand Down Expand Up @@ -190,6 +198,7 @@ As mentioned, you can build 2 modes of UI
cd GenAIExamples/ChatQnA/ui/
docker build --no-cache -t opea/chatqna-ui:latest --build-arg https_proxy=$https_proxy \
--build-arg http_proxy=$http_proxy -f ./docker/Dockerfile .
cd ../../..
```

*Conversation UI*
Expand All @@ -199,6 +208,7 @@ If you want a conversational experience with chatqna megaservice.
cd GenAIExamples/ChatQnA/ui/
docker build --no-cache -t opea/chatqna-conversation-ui:latest --build-arg https_proxy=$https_proxy \
--build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react .
cd ../../..
```

### Sanity Check
Expand Down Expand Up @@ -230,6 +240,8 @@ Check if you have the below set of docker images, before moving on to the next s
:::
::::

:::::
::::::

## Use Case Setup

Expand Down Expand Up @@ -272,58 +284,10 @@ environment variable or `compose.yaml` file.

Set the necessary environment variables to setup the use case case

> Note: Replace `host_ip` with your external IP address. Do **NOT** use localhost
> for the below set of environment variables

### Dataprep

export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6007/v1/dataprep"
export DATAPREP_GET_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/get_file"
export DATAPREP_DELETE_FILE_ENDPOINT="http://${host_ip}:6007/v1/dataprep/delete_file"

### VectorDB

export REDIS_URL="redis://${host_ip}:6379"
export INDEX_NAME="rag-redis"

### Embedding Service

export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
export EMBEDDING_SERVICE_HOST_IP=${host_ip}
export RETRIEVER_SERVICE_HOST_IP=${host_ip}
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"

### Reranking Service

export RERANK_MODEL_ID="BAAI/bge-reranker-base"
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
export RERANK_SERVICE_HOST_IP=${host_ip}

### LLM Service

::::{tab-set}
:::{tab-item} vllm
:sync: vllm

export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export LLM_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_PORT=9000
export vLLM_LLM_ENDPOINT="http://${host_ip}:9009"
:::
:::{tab-item} TGI
:sync: TGI

export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export LLM_SERVICE_HOST_IP=${host_ip}
export LLM_SERVICE_PORT=9000
export TGI_LLM_ENDPOINT="http://${host_ip}:9009"
:::
::::

### Megaservice

export MEGA_SERVICE_HOST_IP=${host_ip}
export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/chatqna"
```
cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/
source ./set_env.sh
```

## Deploy the use case

Expand Down Expand Up @@ -448,7 +412,13 @@ commands. The dataprep microservice extracts the texts from variety of data
sources, chunks the data, embeds each chunk using embedding microservice and
store the embedded vectors in the redis vector database.

Local File `nke-10k-2023.pdf` Upload:
Update Knowledge Base via Local File [nke-10k-2023.pdf](https://github.com/opea-project/GenAIComps/blob/main/comps/retrievers/redis/data/nke-10k-2023.pdf). Or click [here](https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf) to download the file via any web browser. Or run this command to get the file on a terminal.

```
wget https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/retrievers/redis/data/nke-10k-2023.pdf
```

Upload:

```
curl -X POST "http://${host_ip}:6007/v1/dataprep" \
Expand Down Expand Up @@ -618,6 +588,21 @@ while reranking service are not.

### vLLM and TGI Service

In first startup, this service will take more time to download the model files.
After it's finished, the service will be ready.

Try the command below to check whether the LLM serving is ready.

```
docker logs ${CONTAINER_ID} | grep Connected
```

If the service is ready, you will get the response like below.

```
2024-09-03T02:47:53.402023Z INFO text_generation_router::server: router/src/server.rs:2311: Connected
```

::::{tab-set}

:::{tab-item} vllm
Expand Down Expand Up @@ -664,24 +649,30 @@ TGI service generate text for the input prompt. Here is the expected result from
::::


If you get

```
curl: (7) Failed to connect to 100.81.104.168 port 8008 after 0 ms: Connection refused

```
### LLM Microservice

and the log shows model warm up, please wait for a while and try it later.
This service depends on above LLM backend service startup. It will be ready after long time,
to wait for them being ready in first startup.

```
2024-06-05T05:45:27.707509646Z 2024-06-05T05:45:27.707361Z WARN text_generation_router: router/src/main.rs:357: `--revision` is not set
2024-06-05T05:45:27.707539740Z 2024-06-05T05:45:27.707379Z WARN text_generation_router: router/src/main.rs:358: We strongly advise to set it to a known supported commit.
2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model Intel/neural-chat-7b-v3-3
2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model
::::{tab-set}

:::{tab-item} vllm
:sync: vllm

```
curl http://${host_ip}:9000/v1/chat/completions \
-X POST \
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,\
"frequency_penalty":0,"presence_penalty":0, "streaming":true}' \
-H 'Content-Type: application/json'
```
For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation)

### LLM Microservice
:::
:::{tab-item} TGI
:sync: TGI

```
curl http://${host_ip}:9000/v1/chat/completions\
Expand All @@ -691,6 +682,12 @@ curl http://${host_ip}:9000/v1/chat/completions\
-H 'Content-Type: application/json'

```
For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
:::
::::




You will get generated text from LLM:

Expand Down Expand Up @@ -719,7 +716,6 @@ data: [DONE]

```
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
"model": "Intel/neural-chat-7b-v3-3",
"messages": "What is the revenue of Nike in 2023?"
}'

Expand Down
2 changes: 2 additions & 0 deletions getting-started/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ To get started with OPEA you need the right hardware and basic software setup.

- Software Requirements: Refer to the [Support Matrix](https://github.com/opea-project/GenAIExamples/blob/main/README.md#getting-started) to ensure you have the required software components in place.

Note : If you are deploying it on cloud, say AWS, select a VM instance from R7iz or m7i family of instances with base OS as Ubuntu. Refer to the `Note` section under [Deploy ChatQnA Service](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA#deploy-chatqna-service) for installing docker on a clean machine.

## Understanding OPEA's Core Components

Before moving forward, it's important to familiarize yourself with two key elements of OPEA: GenAIComps and GenAIExamples.
Expand Down