Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend] Set server's maximum number of generated tokens using generation_config.json #12242

Merged
merged 34 commits into from
Jan 26, 2025
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
5c85448
Adding max_new_tokens support to generation_config.json
mhendrey Jan 20, 2025
4ad6b45
Changed default_max_tokens to server_max_tokens
mhendrey Jan 20, 2025
95f9c97
Renamed default_max_tokens to server_max_tokens
mhendrey Jan 20, 2025
4786e56
Removed the float("inf") bug
mhendrey Jan 20, 2025
4980a73
Renamed default_max_tokens to server_max_tokens
mhendrey Jan 20, 2025
39d7d76
Rearranged lines to make the changes with existing as small as possible
mhendrey Jan 20, 2025
b6a24c4
Limit generated tokens by server's max_tokens setting when available
mhendrey Jan 20, 2025
aa7cff1
Changed syntax to pass format.sh tests
mhendrey Jan 20, 2025
2f6e43b
[Bugfix] Fix num_heads value for simple connector when tp enabled (#1…
ShangmingCai Jan 20, 2025
6baa0ea
[torch.compile] fix sym_tensor_indices (#12191)
youkaichao Jan 20, 2025
35b5948
Move linting to `pre-commit` (#11975)
hmellor Jan 20, 2025
0c2f332
[DOC] Fix typo in docstring and assert message (#12194)
terrytangyuan Jan 20, 2025
46249e5
[DOC] Add missing docstring in LLMEngine.add_request() (#12195)
terrytangyuan Jan 20, 2025
0b2e3de
[Bugfix] Fix incorrect types in LayerwiseProfileResults (#12196)
terrytangyuan Jan 20, 2025
090eca3
[Model] Add Qwen2 PRM model support (#12202)
Isotr0py Jan 20, 2025
5d36c1f
[Core] Interface for accessing model from `VllmRunner` (#10353)
DarkLight1337 Jan 20, 2025
df331a7
[misc] add placeholder format.sh (#12206)
youkaichao Jan 20, 2025
881964d
[CI/Build] Remove dummy CI steps (#12208)
DarkLight1337 Jan 20, 2025
5cc6a09
[CI/Build] Make pre-commit faster (#12212)
DarkLight1337 Jan 20, 2025
9f3d5a6
[Model] Upgrade Aria to transformers 4.48 (#12203)
DarkLight1337 Jan 20, 2025
957ca23
[misc] print a message to suggest how to bypass commit hooks (#12217)
youkaichao Jan 20, 2025
399d224
[core][bugfix] configure env var during import vllm (#12209)
youkaichao Jan 20, 2025
df06503
[V1] Remove `_get_cache_block_size` (#12214)
heheda12345 Jan 20, 2025
b89529b
[Misc] Pass `attention` to impl backend (#12218)
wangxiyuan Jan 20, 2025
a5d57f1
[Bugfix] Fix `HfExampleModels.find_hf_info` (#12223)
DarkLight1337 Jan 20, 2025
b1af379
[CI] Pass local python version explicitly to pre-commit mypy.sh (#12224)
heheda12345 Jan 20, 2025
0e3a719
Added tests to check max_tokens is properly set
mhendrey Jan 23, 2025
6867b37
Merge branch 'server_max_tokens'
mhendrey Jan 23, 2025
99243cf
Mucked up the rebasing. Fixing that now.
mhendrey Jan 23, 2025
1a15431
Reverting the serving_chat & serving_completion back and putting all …
mhendrey Jan 23, 2025
c10eb1f
Didn't quite revert back. Deleting empty line from both
mhendrey Jan 23, 2025
a3fc62b
Changed to using one-liner and edited engine arg for generation-config
mhendrey Jan 24, 2025
98949f6
Merge branch 'vllm-project:main' into main
mhendrey Jan 24, 2025
c71f429
Converted to a one-liner for taking minimum value & added to generati…
mhendrey Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[torch.compile] fix sym_tensor_indices (#12191)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com>
  • Loading branch information
youkaichao authored and mhendrey committed Jan 23, 2025
commit 6baa0ea5e59ba123a163ba5dbb91d1999800d1d1
6 changes: 5 additions & 1 deletion vllm/compilation/backends.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,9 +624,13 @@ def __call__(self, graph: fx.GraphModule, example_inputs) -> Callable:
]

# index of tensors that have symbolic shapes (batch size)
# for weights and static buffers, they will have concrete shapes.
# symbolic shape only happens for input tensors.
from torch.fx.experimental.symbolic_shapes import is_symbolic
self.sym_tensor_indices = [
i for i, x in enumerate(fake_args)
if isinstance(x, torch._subclasses.fake_tensor.FakeTensor)
if isinstance(x, torch._subclasses.fake_tensor.FakeTensor) and \
any(is_symbolic(d) for d in x.size())
]

# compiler managed cudagraph input buffers
Expand Down