Skip to content

Optimizing for CPUs #5

@AntonOresten

Description

@AntonOresten

While the on-GPU sampling is neat, moving logits to the CPU might still be faster.

One idea would be to have logitsample(f::Function, logits), falling back to logitsamplel(f(logits)), with specialized methods like logitsample(::Top_pk, logits) with better time complexity using a partial sort.

Some rough benchmarks show that logitsample ∘ Top_p(0.5) on 100k logits takes ~2 micro milliseconds on an A6000, which sets an upper limit on inference speed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions