Wen Yang1,2*, Junhong Wu1,2*, Chen Wang1,2, Chengqing Zong1,2, Jiajun Zhang1,2,3,4🌟,
* Equal contribution 🌟 Corresponding author
1 School of Artificial Intelligence, University of Chinese Academy of Sciences
2 Institute of Automation, Chinese Academy of Sciences
3 Wuhan AI Research
4 Shanghai Artificial Intelligence Laboratory, Shanghai, China
We introduce Language Imbalance Driven Rewarding, a novel approach that leverages the inherent capability imbalance across different languages in large language models (LLMs) as a reward signal for iterative self-improvement. By applying iterative DPO training, our approach not only enhances the performance of non-dominant languages but also improves outcomes in dominant languages.
Our goal with this approach is to contribute a new perspective to the multilingual LLM community by challenging the assumption that language imbalance is solely a challenge to be mitigated. We hope this approach will inspire further exploration into multilingual self-improvement in LLMs, broadening the horizon for more balanced and capable language models.
- [28/10/2024]🔥We release the code for Language Imbalance Driven Rewarding!
- [11/10/2024]🔥Language Imbalance Driven Rewarding is coming! We release the paper!
Please follow the instructions below to install the required packages.
- Clone this repository
https://github.com/ZNLP/Language-Imbalance-Driven-Rewarding.git
- Install Package
conda create -n mdpo python=3.10 -y
conda activate mdpo
cd Language-Imbalance-Driven-Rewarding
pip install -r requirements.txt
bash ./scripts/batch_inference.sh
bash ./scripts/batch_translate.sh
Our training is mostly performed on LLaMA-Factory code base. Please refer to that repo for more details.
bash scripts/batch_inference_for_eval.sh
We provide some results in this section. More detailed results can be found in our paper.
- Head-to-head Performance
- X-alpacaEval
- Performances on MGSM benchmark
-
Release training & evaluation code
-
Release GPT-4 Score code
If you find this repo useful for your research, please consider citing the paper
@article{yang2024language,
title={Language Imbalance Driven Rewarding for Multilingual Self-improving},
author={Yang, Wen and Wu, Junhong and Wang, Chen and Zong, Chengqing and Zhang, Jiajun},
journal={arXiv preprint arXiv:2410.08964},
year={2024}
}
We would like to thank the following repos for their great work:
- This work utilizes the great work from LLaMA-Factory, Vllm, transformers, LLaMA, Qwen2
This project is released under the Apache 2.0 license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.