Closed
Description
Feature Description
It seems that the speculative decoding example in this repo only utilizes greedy sampling. Are there any plans for supporting stochastic sampling as well? If not so, could I maybe give it a try based on the paper and implementations inside https://github.com/lucidrains/speculative-decoding?