Skip to content

Mixture-of-Agents Framework Implementation at Distributed Edge Devices with Theoretical Guarantee of Finite Average Latency

Notifications You must be signed in to change notification settings

purbeshmitra/distributed_moa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Distributed Mixture-of-Agents

Distributed Mixture-of-Agents (MoA) is an MoA architecture in a distributed setting, where LLMs operate on individual edge devices, each uniquely associated with a user and equipped with its own distributed computing power. These devices exchange information using decentralized gossip algorithms, allowing different device nodes to talk without the supervision of a centralized server. The MoA setting enhances the overall performance of large language models (LLMs), enabling multiple individual LLMs to work together for collaborative inference. This collaborative approach results in improved responses to user prompts compared to relying on a single LLM.

🔗 Paper link: Distributed Mixture-of-Agents for Edge Inference with Large Language Models

Alt Text

In the considered setup, different users have their own LLM models to address user prompts. Additionally, the devices gossip either their own user-specific prompts or augmented prompts to generate more refined answers to certain queries. User prompts are temporarily stored in the device queues when their corresponding LLMs are busy.

Alt Text

Given the memory limitations of edge devices, to ensure that the average queue sizes in the system remain bounded, the pormpt arrival rate at the users is bounded by some theoritical limit. The limit also depends on the number of layeys in the MoA, and the number of proposer LLMs in each layer. The MoA setting is shown below:

Alt Text

Citation

If you find our work useful, consider citing it as:

@article{
    mitra2024distributed,
    title={Distributed Mixture-of-Agents for Edge Inference with Large Language Models},
    author={Mitra, Purbesh and Kaswan, Priyanka and Ulukus, Sennur},
    journal={arXiv preprint arXiv:2412.21200},
    year={2024}
}

About

Mixture-of-Agents Framework Implementation at Distributed Edge Devices with Theoretical Guarantee of Finite Average Latency

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages