Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion config/gateway/gateway-plugin/gateway-plugin.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ spec:
value: "16"
- name: AIBRIX_PREFIX_CACHE_STANDARD_DEVIATION_FACTOR
value: "2"
- name: AIBRIX_PREFILL_REQUEST_TIMEOUT
value: "60"
# Uncomment to enable request tracing for GPU optimizer, default "false".
# - name: AIBRIX_GPU_OPTIMIZER_TRACING_FLAG
# value: "true"
Expand Down Expand Up @@ -200,4 +202,4 @@ spec:
body: Buffered
response:
body: Streamed
messageTimeout: 5s
messageTimeout: 60s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The messageTimeout for the Envoy extension policy is set to 60s, which is the same value as the AIBRIX_PREFILL_REQUEST_TIMEOUT (also set to 60 in this file). This can lead to race conditions where Envoy times out the request to the gateway plugin just as the internal prefill HTTP request is completing. The messageTimeout should be strictly greater than the prefill request timeout to allow for processing overhead. I suggest increasing this to 65s.

      messageTimeout: 65s

3 changes: 2 additions & 1 deletion dist/chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,12 @@ gatewayPlugin:
AIBRIX_PREFIX_CACHE_BLOCK_SIZE: "128"
AIBRIX_PREFIX_CACHE_POD_RUNNING_REQUEST_IMBALANCE_ABS_COUNT: "16"
AIBRIX_PREFIX_CACHE_STANDARD_DEVIATION_FACTOR: "2"
AIBRIX_PREFILL_REQUEST_TIMEOUT: "60"
dependencies:
redis:
host: aibrix-redis-master
port: 6379
messageTimeout: "5s"
messageTimeout: "60s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The messageTimeout is set to "60s", which is the same value as AIBRIX_PREFILL_REQUEST_TIMEOUT. This can lead to race conditions where Envoy times out the request to the gateway plugin just as the internal prefill HTTP request is completing. The messageTimeout should be strictly greater than the prefill request timeout to allow for processing overhead. I suggest increasing this to 65s.

  messageTimeout: "65s"



gpuOptimizer:
Expand Down
3 changes: 2 additions & 1 deletion dist/chart/vke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,12 @@ gatewayPlugin:
AIBRIX_PREFIX_CACHE_BLOCK_SIZE: "128"
AIBRIX_PREFIX_CACHE_POD_RUNNING_REQUEST_IMBALANCE_ABS_COUNT: "16"
AIBRIX_PREFIX_CACHE_STANDARD_DEVIATION_FACTOR: "2"
AIBRIX_PREFILL_REQUEST_TIMEOUT: "60"
dependencies:
redis:
host: aibrix-redis-master
port: 6379
messageTimeout: "5s"
messageTimeout: "60s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The messageTimeout is set to "60s", which is the same value as AIBRIX_PREFILL_REQUEST_TIMEOUT. This can lead to race conditions where Envoy times out the request to the gateway plugin just as the internal prefill HTTP request is completing. The messageTimeout should be strictly greater than the prefill request timeout to allow for processing overhead. I suggest increasing this to 65s.

  messageTimeout: "65s"



gpuOptimizer:
Expand Down
7 changes: 6 additions & 1 deletion pkg/plugins/gateway/algorithms/pd_disaggregation.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ const (
PDRoleIdentifier string = "role-name"
RoleReplicaIndex string = "stormservice.orchestration.aibrix.ai/role-replica-index"
PodGroupIndex string = "stormservice.orchestration.aibrix.ai/pod-group-index"
defaultPrefillRequestTimeout int = 30
)

var (
prefillRequestTimeout int = utils.LoadEnvInt("AIBRIX_PREFILL_REQUEST_TIMEOUT", defaultPrefillRequestTimeout)
)

func init() {
Expand Down Expand Up @@ -239,7 +244,7 @@ func (r *pdRouter) executeHTTPRequest(url string, routingCtx *types.RoutingConte
req.Header.Set("content-length", strconv.Itoa(len(payload)))

// Execute with timeout
client := &http.Client{Timeout: 30 * time.Second}
client := &http.Client{Timeout: time.Duration(prefillRequestTimeout) * time.Second}
resp, err := client.Do(req)
if err != nil {
return fmt.Errorf("failed to execute http prefill request: %w", err)
Expand Down
5 changes: 2 additions & 3 deletions pkg/plugins/gateway/algorithms/prefix_cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ package routingalgorithms

import (
"context"
"log"
"slices"
"testing"

Expand Down Expand Up @@ -150,8 +149,8 @@ func Test_PrefixCacheE2E(t *testing.T) {
}
ctx7 := types.NewRoutingContext(context.Background(), RouterPrefixCache, "m1", input, "r7", "")
p1, err := prefixCacheRouter.Route(ctx7, podList)
log.Println(p2, p3, p4)
log.Println(p1)
t.Log(p2, p3, p4)
t.Log(p1)
assert.NoError(t, err)
assert.False(t, slices.Contains([]string{p2, p3, p4}, p1))
}
Expand Down