I want use the function prefix_allowed_tokens_fn of huggingface model.generate(), where of vllm's source code shall I modify?

Hello, we all know that in huggingface transformers' origin `model.generate()` method, we can set the function paremeter `prefix_allowed_tokens_fn` to restrict the generation rule. I want to use this function in vllm just like I used in origin `model.generate()` to control the generation process, could you please tell me where of the source code shall I modify to make the model generation obey my custom prefix_allowed_tokens_fn?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

I want use the function prefix_allowed_tokens_fn of huggingface model.generate(), where of vllm's source code shall I modify? #415

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

I want use the function prefix_allowed_tokens_fn of huggingface model.generate(), where of vllm's source code shall I modify? #415

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions