Skip to content
/ MoA Public
forked from sammcj/moa

Fork to work with local LLMs (LM Studio)

Notifications You must be signed in to change notification settings

erik-sv/MoA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mixture-of-Agents Enhances Large Language Model Capabilities

This is a fork of https://github.com/togethercomputer/MoA with some tweaks to make it work with local models.

100% of the credit goes to the original authors.


moa

Mixture of Agents (MoA) is a novel approach that leverages the collective strengths of multiple LLMs to enhance performance, achieving state-of-the-art results. By employing a layered architecture where each layer comprises several LLM agents, MoA significantly outperforms GPT-4 Omni's 57.5% on AlpacaEval 2.0 with a score of 65.1%, using only open-source models!

Interactive Demo

We first present an interactive demo. It showcases a simple multi-turn chatbot where the final response is aggregated from various reference models.

Setup

  1. Setup your environment:

    cp .env.example .env
    vi .env
  2. Install Requirements:

    uv venv
    source .venv/bin/activate
    uv pip install -r requirements.txt

Running the Demo

To run the interactive demo, execute the following script with Python:

python bot.py

The script will prompt you to input instructions interactively. Here's how to use it:

  1. Start by entering your instruction at the ">>>" prompt.
  2. The system will process your input using the predefined reference models.
  3. It will generate a response based on the aggregated outputs from these models.
  4. You can continue the conversation by inputting more instructions, with the system maintaining the context of the multi-turn interaction.
  5. enter exit to exit the chatbot.

Configuration

You can configure the demo by specifying the following parameters:

  • --aggregator: The primary model used for final response generation.
  • --reference_models: List of models used as references.
  • --temperature: Controls the randomness of the response generation.
  • --max_tokens: Maximum number of tokens in the response.
  • --rounds: Number of rounds to process the input for refinement. (num rounds == num of MoA layers - 1)
  • --num_proc: Number of processes to run in parallel for faster execution.
  • --multi_turn: Boolean to toggle multi-turn interaction capability.

Credit / Authors / Acknowledgements

Please see https://github.com/togethercomputer/MoA/

About

Fork to work with local LLMs (LM Studio)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%