- 
        Couldn't load subscription status. 
- Fork 92
Configurable max_tokens/max_completion_tokens key #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
68e69bc    to
    ef981fd      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements configurable request keys for output token limits in OpenAI API calls. Instead of hardcoding both max_tokens and max_completion_tokens in all requests, the system now uses the appropriate key based on endpoint type through a new environment variable configuration.
- Adds GUIDELLM__OPENAI__MAX_OUTPUT_KEYconfiguration mapping endpoint types to their respective output token keys
- Updates payload generation to use the configured key instead of setting both keys
- Fixes test assertions to match the new single-key approach
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description | 
|---|---|
| src/guidellm/config.py | Adds new max_output_key configuration with defaults for text and chat completions | 
| src/guidellm/backend/openai.py | Updates payload generation to use configurable key and adds type definitions | 
| tests/unit/conftest.py | Removes duplicate token limit assertions and fixes mock response generation | 
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to wait for Mark's review, but looks good to me.
commit 121dcdc Author: Samuel Monson <smonson@redhat.com> Date: Fri Oct 10 09:36:09 2025 -0400 Configurable max_tokens/max_completion_tokens key (#399) ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Makes the `max_tokens` request key configurable through an environment variable per endpoint type. Defaults to `max_tokens` for legacy `completions` and `max_completion_tokens` for `chat/completions` ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a dict mapping from route name -> output tokens key. Default is `{"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}` ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Closes #395 - Closes #269 - Related #210 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Samuel Monson <smonson@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> commit a24a22d Author: Samuel Monson <smonson@redhat.com> Date: Thu Oct 9 15:57:19 2025 -0400 Fix typo in CI (#401) ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 81af01b Author: Samuel Monson <smonson@redhat.com> Date: Thu Oct 9 15:53:45 2025 -0400 Fix the failing CI again (#400) ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 90a05ab Author: Samuel Monson <smonson@redhat.com> Date: Thu Oct 9 14:26:50 2025 -0400 Fix for container rc tag (Attempt 2) (#398) ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> This is the same fix as #389 but applied to the RC workflow rather than the release workflow as was the original intent with #389. Both workflows need this change so not reverting the other one. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 000b39e Author: Samuel Monson <smonson@redhat.com> Date: Fri Oct 3 17:46:04 2025 -0400 Fix for container rc tag (#389) ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Fix to parsing rc ref in CI --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 108a657 Author: Benjamin Blue <dalcowboiz@gmail.com> Date: Fri Oct 3 10:35:32 2025 -0400 update tpot to itl in labels and code use (#386) ## Summary We want to use ITL instead of TPOT. The data we had previously happened to be ITL data, but all of the labels indicate that it is TPOT data. Now the code and labels reflect that it is ITL data. ## Test Plan - Everything works, tests pass, No use of TPOT in the UI --------- Signed-off-by: dalthecow <dalcowboiz@gmail.com> Co-authored-by: Samuel Monson <smonson@redhat.com> commit b1b1b78 Author: Benjamin Blue <dalcowboiz@gmail.com> Date: Wed Oct 1 13:14:47 2025 -0400 update default build values to use versioned builds (#310) ## Summary With the default path referring to the versioned build now, users will no longer experience their html reports breaking randomly when the build files are updated. Also fixed versioned build directory path issue that I missed previously --------- Signed-off-by: dalthecow <dalcowboiz@gmail.com> commit 5c9982a Merge: ad25e06 2c0d993 Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:23:27 2025 -0400 first benchark testing example (#328) ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> <img width="1757" height="1212" alt="image" src="https://github.com/user-attachments/assets/fbfddeac-ca56-40c0-b7ae-d2f17d50823a" /> ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) commit 2c0d993 Merge: d1297fe ad25e06 Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:20:10 2025 -0400 Merge branch 'main' into example_simulator commit ad25e06 Merge: f8f6f9d c32896c Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:19:59 2025 -0400 Add formatting to json file with metrics (#372) ## Summary It's inconvenient to look at metrics. ## Details - ## Test Plan - code launch ## Related Issues - Resolves ##371 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) commit d1297fe Merge: 8159ca7 f8f6f9d Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:17:36 2025 -0400 Merge branch 'main' into example_simulator commit c32896c Merge: 0701389 f8f6f9d Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:14:35 2025 -0400 Merge branch 'main' into add_json_formatiing commit f8f6f9d Author: Samuel Monson <smonson@redhat.com> Date: Tue Sep 30 10:21:54 2025 -0400 Container CI bugfix and disable dry-run on image cleaner (#379) ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Final pieces needed for image CI work. Fully enables auto `latest`, `stable` tags and old image pruning. ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - Add `pipefail` to list-tags command to catch failures - Add missing `ghcr.io/` to skopeo commands - Disable dry-run option for development image cleanup job ## Test Plan Ran with `workflow_dispatch` [see here](https://github.com/vllm-project/guidellm/actions/runs/18108553536) <img width="2032" height="955" alt="2025-09-29T15-45-39" src="https://github.com/user-attachments/assets/b981ab01-fe90-4e15-bf60-cb483508065e" /> <img width="1204" height="579" alt="2025-09-29T15-46-02" src="https://github.com/user-attachments/assets/68118168-2e80-4d45-92cc-47badc1caf16" /> --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Samuel Monson <smonson@redhat.com> commit 0701389 Author: psydok <47638600+psydok@users.noreply.github.com> Date: Thu Sep 25 23:14:36 2025 +0500 Add formatting to json file with metrics Signed-off-by: psydok <47638600+psydok@users.noreply.github.com> commit 8159ca7 Author: guangli.bao <guangli.bao@daocloud.io> Date: Mon Sep 15 12:07:08 2025 +0800 first draft Signed-off-by: guangli.bao <guangli.bao@daocloud.io>
<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Makes the `max_tokens` request key configurable through an environment
variable per endpoint type. Defaults to `max_tokens` for legacy
`completions` and `max_completion_tokens` for `chat/completions`
<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a
dict mapping from route name -> output tokens key. Default is
`{"text_completions": "max_tokens", "chat_completions":
"max_completion_tokens"}`
<!--
List the steps needed to test this PR.
-->
-
<!--
Link any relevant issues that this PR addresses.
-->
- Closes #395
- Closes #269
- Related #210
---
- [x] "I certify that all code in this PR is my own, except as noted
below."
- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
---------
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
    
Summary
Makes the
max_tokensrequest key configurable through an environment variable per endpoint type. Defaults tomax_tokensfor legacycompletionsandmax_completion_tokensforchat/completionsDetails
GUIDELLM__OPENAI__MAX_OUTPUT_KEYconfig option which is a dict mapping from route name -> output tokens key. Default is{"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##)