### Context At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots. Proposal: implement a fair batch usage of prompt processing accross all pending slots. References: - https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-2043558080 - https://github.com/ggerganov/llama.cpp/issues/5851#issuecomment-1975120585