MoE loading time regression

Three weeks ago #6387 removed mmap() support for MoE models. This causes Mixtral 8x7b F16 to take 30x longer to load on my Threadripper w/ 5200 MT/s RAM. It used to take 2 seconds to load. Now it takes 56 seconds to load.

![image](https://github.com/ggerganov/llama.cpp/assets/49262/4230aa47-f00e-480a-8440-7c5b51ea8179)

Can we reconsider this? I would rather have 3d tensor creation be a 1-time cost in the conversion script, rather than happening each time the llama.cpp process spawns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MoE loading time regression #6798

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MoE loading time regression #6798

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions