-
Notifications
You must be signed in to change notification settings - Fork 294
Support denvr endpoints with Litellm. #2085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support denvr endpoints with Litellm. #2085
Conversation
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Dependency Review✅ No vulnerabilities or license issues found.Scanned FilesNone |
for more information, see https://pre-commit.ci
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
https://github.com/srinarayan-srikanthan/GenAIExamples into denvr_chat merging remote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for deploying ChatQnA with remote denvr inference endpoints and updates the streaming response parser for multi-chunk JSON outputs.
- Introduce a new
compose_remote.yaml
workflow and environment variable instructions in the Xeon CPU Docker README. - Update
align_generator
inchatqna.py
to split and process multiple JSON chunks per line.
Reviewed Changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.
File | Description |
---|---|
ChatQnA/docker_compose/intel/cpu/xeon/README.md | Added remote endpoint deployment steps and updated compose table |
ChatQnA/chatqna.py | Refactored align_generator to handle multi-chunk streaming JSON |
Comments suppressed due to low confidence (3)
ChatQnA/docker_compose/intel/cpu/xeon/README.md:78
- [nitpick] Clarify whether
REMOTE_ENDPOINT
should include the/v1/chat/completions
path or just the base URL to avoid confusion.
**Note**: Set REMOTE_ENDPOINT variable value to "https://api.inference.denvrdata.com" when the remote endpoint to access is "https://api.inference.denvrdata.com/v1/chat/completions"
ChatQnA/chatqna.py:178
- [nitpick] The outer variable
line
is reused for the inner loop below, which can reduce readability; consider renaming the loop variable tochunk
or similar.
chunks = [chunk.strip() for chunk in line.split("\n\n") if chunk.strip()]
ChatQnA/chatqna.py:191
- The previous
finish_reason
check was removed, which may cause tokens to be emitted after the stream should end; consider re-adding or documenting this behavior change.
elif "content" in json_data["choices"][0]["delta"]:
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Description
Support remote inference with denvr endpoint for chatqna with readme updates.
Issues
#2084
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
N/A
Tests
Describe the tests that you ran to verify your changes.