Closed as not planned
Description
As mentioned in #81 (comment), the current PyTorch-based top-k and top-p implementation is memory-inefficient. This can be improved by introducing custom kernels.
As mentioned in #81 (comment), the current PyTorch-based top-k and top-p implementation is memory-inefficient. This can be improved by introducing custom kernels.