Fusing LLM Capabilities with Routing Data

🌐 Project Page | 📜 arXiv | 📂 Dataset | 🤖 Model | 🖥️ Demo

Overview of LLM capability fusion via FusionFactory with three representative levels: Query-level, Thought-level, and Model-level.

News

[2025.06] 🌟 FusionFactory was released.

🛠️Environment Setup

conda create -n fusionfactory python=3.9
conda activate fusionfactory
pip install pandas
pip install datasets
pip install tqdm
pip install transformers
pip install sentence_transformers
pip install torch
pip install numpy

🎯Data Process

Run the following command to start data collection.

# split: train OR test
# case num: 500 for train & 50 for partial test
# a sample of LLM description: ./data_process/LLM_Descriptions.json
python data_process/data_combine.py \
--split train \
--case_num 500 \
--round 5 \
--llm_description_path [YOUR_LLM_PATH] \
--csv_save_path [YOUR_SAVE_PATH] \
--api_base [YOUR_API_BASE] \
--api_key [YOUR_API_KEY]

You may refer to the specific README in the data_process directory for detailed argument descriptions.

To add quality scores to the collected data using an LLM judge:

python data_process/add_llm_judge.py

This will evaluate each response and add quality scores to the dataset, which can be used for training and evaluation purposes. See the data_process/README.md for more details.

📊Experiments

Query-level Fusion

First, run the data preprocessing script to prepare the dataset:

# Preprocess the dataset and generate training/testing files
python query_level/data_processing.py

For more detailed information about the data preprocessing and model training process, please refer to the specific README in the query_level directory.

Thought-level Fusion

First, run the data preprocessing script to prepare the thought prompts:

# Preprocess the dataset and generate training/testing files
python query_level/data_processing.py

Or run the script to directly use Huggingface datasets to generate thought-enhanced queries

python thought_level/get_thought_prompt.py

For more detailed information about the data preprocessing and model training process, please refer to the specific README in the thought_level directory.

Model-level Fusion

You can refer to LLaMA-Factory for detailed instructions to start fine-tuning on model-level fusion data. Make sure to first clone the LLaMA-Factory repository into the FusionBench directory, and then execute the following commands to generate SFT data for model-level fusion:

# setting: perf, judge, hybrid, baseline
python model_level/sft_data_gen.py --settin perf --k 5 --save_path [YOUR_PATH] --csv_path_with_judge [YOUR_PATH]

python model_level/sft_test_gen.py --save_path [YOUR_PATH] --csv_path [YOUR_PATH]

Then, you can use the following commands to start SFT and Inference after essential configuration described in LLaMA-Factory Doc

# SFT
FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=2,3,4,5 llamafactory-cli train examples/train_lora/[YOUR_YAML].yaml

# Inference
CUDA_VISIBLE_DEVICES=2,3,4,5 python scripts/vllm_infer.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path saves/llama3.1-8b/lora/[YOUR_PATH] --dataset router_test --cutoff_len 2048

You may refer to the specific README in the model_level directory for detailed instructions.

📈 Evaluation

FusionBench provides a comprehensive evaluation framework to assess model performance across various tasks. The evaluation framework supports multiple types of tasks including:

Mathematical Reasoning (GSM8K, MATH)
Code Generation (MBPP, HumanEval)
Commonsense Reasoning (CommonsenseQA, OpenBookQA, ARC Challenge, HellaSwag)
World Knowledge (Natural Questions, TriviaQA)
Reading Comprehension (SQuAD, BoolQ)
Popular Benchmarks (MMLU, GPQA)

To evaluate your model's performance:

python eval/response_eval.py

For detailed information about the evaluation framework, supported metrics, and usage instructions, please refer to the Evaluation Documentation.

Citation

@article{FusionFactory,
  title={Fusing LLM Capabilities with Routing Data},
  author={Tao Feng and Haozhen Zhang and Zijie Lei and Pengrui Han and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jiaxuan You},
  journal={arXiv preprint arXiv:xxxx.xxxxx},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fusing LLM Capabilities with Routing Data

News

🛠️Environment Setup

🎯Data Process

📊Experiments

Query-level Fusion

Thought-level Fusion

Model-level Fusion

📈 Evaluation

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data_process		data_process
dataset		dataset
docs		docs
eval		eval
figures		figures
model_level		model_level
query_level		query_level
thought_level		thought_level
README.md		README.md
requirements.txt		requirements.txt

ulab-uiuc/FusionFactory

Folders and files

Latest commit

History

Repository files navigation

Fusing LLM Capabilities with Routing Data

News

🛠️Environment Setup

🎯Data Process

📊Experiments

Query-level Fusion

Thought-level Fusion

Model-level Fusion

📈 Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages