Skip to content

llama : combine expert tensors into a single tensor #6082

Closed
@ggerganov

Description

@ggerganov

Currently, we store separate tensors for each expert:

https://github.com/ggerganov/llama.cpp/blob/3020327f6cd6d2ce50528dd65f4b199d2ea8b1ae/ggml.c#L4442-L4455

This leads to large number of possible "source" tensors for the _id ops which increases significantly the size of struct ggml_tensor on the stack:

https://github.com/ggerganov/llama.cpp/blob/3020327f6cd6d2ce50528dd65f4b199d2ea8b1ae/ggml.h#L573-L576

Additionally, the Metal implementation is currently hacked to support up to 8 experts and extension to more than that is not completely obvious:

https://github.com/ggerganov/llama.cpp/blob/3020327f6cd6d2ce50528dd65f4b199d2ea8b1ae/ggml-metal.m#L1750-L1759

We should improve this, with one possible way being to store the data for the experts into a single tensor and address is with appropriate offsets

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions