Skip to content

Implement custom kernels for top-k and top-p sampling #125

Closed as not planned
@WoosukKwon

Description

@WoosukKwon

As mentioned in #81 (comment), the current PyTorch-based top-k and top-p implementation is memory-inefficient. This can be improved by introducing custom kernels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededperformancePerformance-related issuesstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions