### Feature request Recently proposed method prompt lookup decoding, which replaces the draft model with string matching in prompt Code: https://github.com/apoorvumang/prompt-lookup-decoding ### Motivation - The method gives significant speedups in input grounded tasks (2x-4x) - Applicable to all decoder models, supports sampling - Easy to implement - we can just modify assisted generation to also support a function for assistant model (rather than a LLM) ### Your contribution I have a not-so-well written implementation [here](https://github.com/apoorvumang/prompt-lookup-decoding/blob/main/demo-pld.ipynb) (python notebook). I can contribute in making it better, but will need help since its my first time