Configurable max_tokens/max_completion_tokens key #399

sjmonson · 2025-10-09T19:03:49Z

Summary

Makes the max_tokens request key configurable through an environment variable per endpoint type. Defaults to max_tokens for legacy completions and max_completion_tokens for chat/completions

Details

Add the GUIDELLM__OPENAI__MAX_OUTPUT_KEY config option which is a dict mapping from route name -> output tokens key. Default is {"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}

Test Plan

Related Issues

Closes rm max_completion_tokens #395
Closes Guidellm adds unexpected field to requests #269
Related Possible invalid request formatting for `max_completion_tokens` #210

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

Signed-off-by: Samuel Monson <smonson@redhat.com>

Copilot

Pull Request Overview

This PR implements configurable request keys for output token limits in OpenAI API calls. Instead of hardcoding both max_tokens and max_completion_tokens in all requests, the system now uses the appropriate key based on endpoint type through a new environment variable configuration.

Adds GUIDELLM__OPENAI__MAX_OUTPUT_KEY configuration mapping endpoint types to their respective output token keys
Updates payload generation to use the configured key instead of setting both keys
Fixes test assertions to match the new single-key approach

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/guidellm/config.py	Adds new max_output_key configuration with defaults for text and chat completions
src/guidellm/backend/openai.py	Updates payload generation to use configurable key and adds type definitions
tests/unit/conftest.py	Removes duplicate token limit assertions and fixes mock response generation

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/guidellm/backend/openai.py

jaredoconnell

You may want to wait for Mark's review, but looks good to me.

This reverts commit 121dcdc.

commit 121dcdc Author: Samuel Monson <smonson@redhat.com> Date: Fri Oct 10 09:36:09 2025 -0400 Configurable max_tokens/max_completion_tokens key (#399) ## Summary  Makes the `max_tokens` request key configurable through an environment variable per endpoint type. Defaults to `max_tokens` for legacy `completions` and `max_completion_tokens` for `chat/completions` ## Details  - Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a dict mapping from route name -> output tokens key. Default is `{"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}` ## Test Plan  - ## Related Issues  - Closes #395 - Closes #269 - Related #210 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Samuel Monson <smonson@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> commit a24a22d Author: Samuel Monson <smonson@redhat.com> Date: Thu Oct 9 15:57:19 2025 -0400 Fix typo in CI (#401) ## Summary  ## Details  - [ ] ## Test Plan  - ## Related Issues  - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 81af01b Author: Samuel Monson <smonson@redhat.com> Date: Thu Oct 9 15:53:45 2025 -0400 Fix the failing CI again (#400) ## Summary  ## Details  - [ ] ## Test Plan  - ## Related Issues  - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 90a05ab Author: Samuel Monson <smonson@redhat.com> Date: Thu Oct 9 14:26:50 2025 -0400 Fix for container rc tag (Attempt 2) (#398) ## Summary  This is the same fix as #389 but applied to the RC workflow rather than the release workflow as was the original intent with #389. Both workflows need this change so not reverting the other one. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 000b39e Author: Samuel Monson <smonson@redhat.com> Date: Fri Oct 3 17:46:04 2025 -0400 Fix for container rc tag (#389) ## Summary  Fix to parsing rc ref in CI --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <smonson@redhat.com> commit 108a657 Author: Benjamin Blue <dalcowboiz@gmail.com> Date: Fri Oct 3 10:35:32 2025 -0400 update tpot to itl in labels and code use (#386) ## Summary We want to use ITL instead of TPOT. The data we had previously happened to be ITL data, but all of the labels indicate that it is TPOT data. Now the code and labels reflect that it is ITL data. ## Test Plan - Everything works, tests pass, No use of TPOT in the UI --------- Signed-off-by: dalthecow <dalcowboiz@gmail.com> Co-authored-by: Samuel Monson <smonson@redhat.com> commit b1b1b78 Author: Benjamin Blue <dalcowboiz@gmail.com> Date: Wed Oct 1 13:14:47 2025 -0400 update default build values to use versioned builds (#310) ## Summary With the default path referring to the versioned build now, users will no longer experience their html reports breaking randomly when the build files are updated. Also fixed versioned build directory path issue that I missed previously --------- Signed-off-by: dalthecow <dalcowboiz@gmail.com> commit 5c9982a Merge: ad25e06 2c0d993 Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:23:27 2025 -0400 first benchark testing example (#328) ## Summary  <img width="1757" height="1212" alt="image" src="https://github.com/user-attachments/assets/fbfddeac-ca56-40c0-b7ae-d2f17d50823a" /> ## Details  - [ ] ## Test Plan  - ## Related Issues  - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) commit 2c0d993 Merge: d1297fe ad25e06 Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:20:10 2025 -0400 Merge branch 'main' into example_simulator commit ad25e06 Merge: f8f6f9d c32896c Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:19:59 2025 -0400 Add formatting to json file with metrics (#372) ## Summary It's inconvenient to look at metrics. ## Details - ## Test Plan - code launch ## Related Issues - Resolves ##371 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) commit d1297fe Merge: 8159ca7 f8f6f9d Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:17:36 2025 -0400 Merge branch 'main' into example_simulator commit c32896c Merge: 0701389 f8f6f9d Author: Mark Kurtz <mark.j.kurtz@gmail.com> Date: Wed Oct 1 08:14:35 2025 -0400 Merge branch 'main' into add_json_formatiing commit f8f6f9d Author: Samuel Monson <smonson@redhat.com> Date: Tue Sep 30 10:21:54 2025 -0400 Container CI bugfix and disable dry-run on image cleaner (#379) ## Summary  Final pieces needed for image CI work. Fully enables auto `latest`, `stable` tags and old image pruning. ## Details  - Add `pipefail` to list-tags command to catch failures - Add missing `ghcr.io/` to skopeo commands - Disable dry-run option for development image cleanup job ## Test Plan Ran with `workflow_dispatch` [see here](https://github.com/vllm-project/guidellm/actions/runs/18108553536) <img width="2032" height="955" alt="2025-09-29T15-45-39" src="https://github.com/user-attachments/assets/b981ab01-fe90-4e15-bf60-cb483508065e" /> <img width="1204" height="579" alt="2025-09-29T15-46-02" src="https://github.com/user-attachments/assets/68118168-2e80-4d45-92cc-47badc1caf16" /> --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Samuel Monson <smonson@redhat.com> commit 0701389 Author: psydok <47638600+psydok@users.noreply.github.com> Date: Thu Sep 25 23:14:36 2025 +0500 Add formatting to json file with metrics Signed-off-by: psydok <47638600+psydok@users.noreply.github.com> commit 8159ca7 Author: guangli.bao <guangli.bao@daocloud.io> Date: Mon Sep 15 12:07:08 2025 +0800 first draft Signed-off-by: guangli.bao <guangli.bao@daocloud.io>

Makes the `max_tokens` request key configurable through an environment variable per endpoint type. Defaults to `max_tokens` for legacy `completions` and `max_completion_tokens` for `chat/completions`  - Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a dict mapping from route name -> output tokens key. Default is `{"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}`  -  - Closes #395 - Closes #269 - Related #210 --- - [x] "I certify that all code in this PR is my own, except as noted below." - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Samuel Monson <smonson@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

rm max_completion_tokens

7a0b160

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>

sjmonson changed the title ~~Fix/drop max completion tokens~~ Configurable max_tokens/max_completion_tokens key Oct 9, 2025

sjmonson added 2 commits October 9, 2025 15:24

Set max_tokens key name based on completion endpoint

6b7f10c

Signed-off-by: Samuel Monson <smonson@redhat.com>

Fix for backend tests

ef981fd

Signed-off-by: Samuel Monson <smonson@redhat.com>

sjmonson force-pushed the fix/drop_max_completion_tokens branch from 68e69bc to ef981fd Compare October 9, 2025 19:30

sjmonson requested review from Copilot and markurtz October 9, 2025 20:01

Copilot AI reviewed Oct 9, 2025

View reviewed changes

src/guidellm/backend/openai.py Show resolved Hide resolved

src/guidellm/backend/openai.py Show resolved Hide resolved

sjmonson mentioned this pull request Oct 9, 2025

rm max_completion_tokens #395

Closed

4 tasks

jaredoconnell approved these changes Oct 9, 2025

View reviewed changes

markurtz approved these changes Oct 10, 2025

View reviewed changes

Merge branch 'main' into fix/drop_max_completion_tokens

03d7482

sjmonson merged commit 121dcdc into main Oct 10, 2025
17 checks passed

sjmonson deleted the fix/drop_max_completion_tokens branch October 10, 2025 13:36

sjmonson added a commit that referenced this pull request Oct 10, 2025

Revert "Configurable max_tokens/max_completion_tokens key (#399)"

60b77e5

This reverts commit 121dcdc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Configurable max_tokens/max_completion_tokens key #399

Configurable max_tokens/max_completion_tokens key #399

Uh oh!

sjmonson commented Oct 9, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

jaredoconnell left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Configurable max_tokens/max_completion_tokens key #399

Configurable max_tokens/max_completion_tokens key #399

Uh oh!

Conversation

sjmonson commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

jaredoconnell left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sjmonson commented Oct 9, 2025 •

edited

Loading