Optimizing for CPUs

While the on-GPU sampling is neat, moving logits to the CPU might still be faster.

One idea would be to have `logitsample(f::Function, logits)`, falling back to `logitsamplel(f(logits))`, with specialized methods like `logitsample(::Top_pk, logits)` with better time complexity using a partial sort.

Some rough benchmarks show that `logitsample ∘ Top_p(0.5)` on 100k logits takes ~2 ~~micro~~ milliseconds on an A6000, which sets an upper limit on inference speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing for CPUs #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimizing for CPUs #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions