Closed
Description
Three weeks ago #6387 removed mmap() support for MoE models. This causes Mixtral 8x7b F16 to take 30x longer to load on my Threadripper w/ 5200 MT/s RAM. It used to take 2 seconds to load. Now it takes 56 seconds to load.
Can we reconsider this? I would rather have 3d tensor creation be a 1-time cost in the conversion script, rather than happening each time the llama.cpp process spawns.