Skip to content

Commit a8df035

Browse files
committed
Bump Istio tag reference
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
1 parent 1b5fb26 commit a8df035

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

site-src/guides/index.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
This project is still in an alpha state and breaking changes may occur in the future.
66

7-
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get an Inference Gateway up and running!
7+
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get an Inference Gateway up and running!
88

99
## **Prerequisites**
1010

@@ -35,7 +35,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
3535

3636
For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas in `./config/manifests/vllm/gpu-deployment.yaml` as needed.
3737
Create a Hugging Face secret to download the model [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). Ensure that the token grants access to this model.
38-
38+
3939
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
4040

4141
```bash
@@ -46,11 +46,11 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
4646
=== "CPU-Based Model Server"
4747

4848
This setup is using the formal `vllm-cpu` image, which according to the documentation can run vLLM on x86 CPU platform.
49-
For this setup, we use approximately 9.5GB of memory and 12 CPUs for each replica.
50-
49+
For this setup, we use approximately 9.5GB of memory and 12 CPUs for each replica.
50+
5151
While it is possible to deploy the model server with less resources, this is not recommended. For example, in our tests, loading the model using 8GB of memory and 1 CPU was possible but took almost 3.5 minutes and inference requests took unreasonable time. In general, there is a tradeoff between the memory and CPU we allocate to our pods and the performance. The more memory and CPU we allocate the better performance we can get.
52-
53-
After running multiple configurations of these values we decided in this sample to use 9.5GB of memory and 12 CPUs for each replica, which gives reasonable response times. You can increase those numbers and potentially may even get better response times. For modifying the allocated resources, adjust the numbers in [cpu-deployment.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml) as needed.
52+
53+
After running multiple configurations of these values we decided in this sample to use 9.5GB of memory and 12 CPUs for each replica, which gives reasonable response times. You can increase those numbers and potentially may even get better response times. For modifying the allocated resources, adjust the numbers in [cpu-deployment.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml) as needed.
5454

5555
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
5656

@@ -104,7 +104,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
104104

105105
=== "GKE"
106106

107-
1. Enable the Gateway API and configure proxy-only subnets when necessary. See [Deploy Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways)
107+
1. Enable the Gateway API and configure proxy-only subnets when necessary. See [Deploy Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways)
108108
for detailed instructions.
109109

110110
1. Deploy Gateway and HealthCheckPolicy resources
@@ -141,17 +141,17 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
141141

142142
=== "Istio"
143143

144-
Please note that this feature is currently in an experimental phase and is not intended for production use.
144+
Please note that this feature is currently in an experimental phase and is not intended for production use.
145145
The implementation and user experience are subject to changes as we continue to iterate on this project.
146146

147147
1. Requirements
148148

149149
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.
150150

151151
2. Install Istio
152-
152+
153153
```
154-
TAG=1.26-alpha.9befed2f1439d883120f8de70fd70d84ca0ebc3d
154+
TAG=1.26-alpha.665da00e1e5392c31cf44cd4dedecd354dd660d5
155155
# on Linux
156156
wget https://storage.googleapis.com/istio-build/dev/$TAG/istioctl-$TAG-linux-amd64.tar.gz
157157
tar -xvf istioctl-$TAG-linux-amd64.tar.gz

0 commit comments

Comments
 (0)