You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I was not able to reproduce this issue on my computer.
Can you check if this slowdown also occurs if you use the same logits processor instance for multiple requests?
RegexParser is implemented behind the scenes via a state machine.
The first time a state is encountered, all of the legal tokens are calculated. This can take a bit of time.
However, next time, its not calculated at all. In the regex you gave as an example, there are only 3 states to the FSM, so after the first generation, the next ones should have negligible impact.
I'm trying to use a regex controlled generation, for me the processing does not even start when I have a huge batch, it worked when I gave it a single example, albeit slow. My inital regex was to match a valid Python list of strings which I simplified, but it still does not even start :
list_regex = r'\[.*\]'
parser = RegexParser(list_regex)
logits_processor = build_vllm_logits_processor(tokenizer_data, parser)
sampling_params = SamplingParams(max_tokens=100, logits_processors=[logits_processor])
results = llm.generate([p['text'] for p in dataset], sampling_params=sampling_params)
On A100 GPU, use vllm to load the local model and use lmformatenforcer.
If
output_regex=r"\d*\w*"
, tokens per second are about 200If
output_regex=r"\d*.*"
, tokens per second are about 5.The speed dropped by a factor of forty.
Sometimes "." is necessary for regular expressions.
The text was updated successfully, but these errors were encountered: