Context splitting

# Feature Description

Add the ability to split context between multiple GPUs, much as model layers can currently be split.

# Motivation

Currently, with multi-GPU setups, LCPP only stores/processes context on the "first" GPU.  This is fine for most models which are only capable of handling 4k context tokens natively (or double that with rope scaling).  But as more and more large context models are being released, this limitation is becoming an issue.  For example, the new yi 34b 200k models are limited to however much context can be fit into the first GPU only (64k in the case of a 24GB card), regardless of total VRAM available.  If context could be split across multiple cards, then a larger context window could be utilized.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Context splitting #4269

Feature Description

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context splitting #4269

Description

Feature Description

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions