[Bug] Mirostat samplers don't work properly with parallel generation #3537
Closed
Description
This is because llama_sample_token
in common.cpp
uses a static for mirostat1 and 2 mu
. Because of this, different sequences will affect each other (including ones that were already deleted).
The fix for this doesn't really seem that simple. I don't think it can be done only inside llama_sample_token
. I think llama_sample_token
is going to have to get changed to take something like a sequence-specific sampler state structure where stuff like that sequence's mu
could get stored. Then it would be up to the app to reset mu
when appropriate (like the sequence ends and the slot will be reused).