Skip to content

MoE loading time regression #6798

Closed
Closed
@jart

Description

@jart

Three weeks ago #6387 removed mmap() support for MoE models. This causes Mixtral 8x7b F16 to take 30x longer to load on my Threadripper w/ 5200 MT/s RAM. It used to take 2 seconds to load. Now it takes 56 seconds to load.

image

Can we reconsider this? I would rather have 3d tensor creation be a 1-time cost in the conversion script, rather than happening each time the llama.cpp process spawns.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions