-
Notifications
You must be signed in to change notification settings - Fork 435
Make prefill request timeout configurable #1377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -67,11 +67,12 @@ gatewayPlugin: | |
AIBRIX_PREFIX_CACHE_BLOCK_SIZE: "128" | ||
AIBRIX_PREFIX_CACHE_POD_RUNNING_REQUEST_IMBALANCE_ABS_COUNT: "16" | ||
AIBRIX_PREFIX_CACHE_STANDARD_DEVIATION_FACTOR: "2" | ||
AIBRIX_PREFILL_REQUEST_TIMEOUT: "60" | ||
dependencies: | ||
redis: | ||
host: aibrix-redis-master | ||
port: 6379 | ||
messageTimeout: "5s" | ||
messageTimeout: "60s" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
|
||
|
||
|
||
gpuOptimizer: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,11 +52,12 @@ gatewayPlugin: | |
AIBRIX_PREFIX_CACHE_BLOCK_SIZE: "128" | ||
AIBRIX_PREFIX_CACHE_POD_RUNNING_REQUEST_IMBALANCE_ABS_COUNT: "16" | ||
AIBRIX_PREFIX_CACHE_STANDARD_DEVIATION_FACTOR: "2" | ||
AIBRIX_PREFILL_REQUEST_TIMEOUT: "60" | ||
dependencies: | ||
redis: | ||
host: aibrix-redis-master | ||
port: 6379 | ||
messageTimeout: "5s" | ||
messageTimeout: "60s" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
|
||
|
||
|
||
gpuOptimizer: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
messageTimeout
for the Envoy extension policy is set to 60s, which is the same value as theAIBRIX_PREFILL_REQUEST_TIMEOUT
(also set to 60 in this file). This can lead to race conditions where Envoy times out the request to the gateway plugin just as the internal prefill HTTP request is completing. ThemessageTimeout
should be strictly greater than the prefill request timeout to allow for processing overhead. I suggest increasing this to65s
.