-
Notifications
You must be signed in to change notification settings - Fork 13.1k
Description
I have currently access to 20 old computers, each with 32GB ram and 4 cores, 256gb ssd, 1 gbit speed network, connected to a 48port switch. (i could get a lot lot more computers but i dont have enough electricity currently)
Would it be somehow possible to distribute the llama model with llama.cpp to the 20 computers to being able to run the 65b model at a moderate speed?
What would i have to do to distribute the model on many computers to run it on cpu?
i am only interested in inference, not training..... for training i can rent cloud gpu's.
Thanks for any input that would help me / recommendation / problems.
What i see as a problem is how to split the model / models (in case i use other models) efficiently so that network bandwidth isnt the limiting factor.