Closed
Description
Hello, we all know that in huggingface transformers' origin model.generate()
method, we can set the function paremeter prefix_allowed_tokens_fn
to restrict the generation rule. I want to use this function in vllm just like I used in origin model.generate()
to control the generation process, could you please tell me where of the source code shall I modify to make the model generation obey my custom prefix_allowed_tokens_fn?