Skip to content

ZNLP/Language-Imbalance-Driven-Rewarding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Imbalance Driven Rewarding for Multilingual Self-improving


Wen Yang1,2*, Junhong Wu1,2*, Chen Wang1,2, Chengqing Zong1,2, Jiajun Zhang1,2,3,4🌟,

* Equal contribution 🌟 Corresponding author

1 School of Artificial Intelligence, University of Chinese Academy of Sciences
2 Institute of Automation, Chinese Academy of Sciences
3 Wuhan AI Research 4 Shanghai Artificial Intelligence Laboratory, Shanghai, China

Multilingual-Self-Improving

Overview

We introduce Language Imbalance Driven Rewarding, a novel approach that leverages the inherent capability imbalance across different languages in large language models (LLMs) as a reward signal for iterative self-improvement. By applying iterative DPO training, our approach not only enhances the performance of non-dominant languages but also improves outcomes in dominant languages.

Our goal with this approach is to contribute a new perspective to the multilingual LLM community by challenging the assumption that language imbalance is solely a challenge to be mitigated. We hope this approach will inspire further exploration into multilingual self-improvement in LLMs, broadening the horizon for more balanced and capable language models.

🔥 Update

  • [28/10/2024]🔥We release the code for Language Imbalance Driven Rewarding!
  • [11/10/2024]🔥Language Imbalance Driven Rewarding is coming! We release the paper!

👀 Contents

📷 Setup

Please follow the instructions below to install the required packages.

  1. Clone this repository
https://github.com/ZNLP/Language-Imbalance-Driven-Rewarding.git
  1. Install Package
conda create -n mdpo python=3.10 -y
conda activate mdpo
cd Language-Imbalance-Driven-Rewarding
pip install -r requirements.txt

💡 Preparation

bash ./scripts/batch_inference.sh
bash ./scripts/batch_translate.sh

📈 Train

Our training is mostly performed on LLaMA-Factory code base. Please refer to that repo for more details.

📈 Evaluation

bash scripts/batch_inference_for_eval.sh

👀 Experiments

We provide some results in this section. More detailed results can be found in our paper.

General Instruction Following

  • Head-to-head Performance
  • X-alpacaEval
Click to expand more examples

The Multilingual MT-Bench Benchmark

The Multilingual NLP Benchmarks

Arithmetic Reasoning

  • Performances on MGSM benchmark

Schedule

  • Release training & evaluation code

  • Release GPT-4 Score code

Citation

If you find this repo useful for your research, please consider citing the paper

@article{yang2024language,
  title={Language Imbalance Driven Rewarding for Multilingual Self-improving},
  author={Yang, Wen and Wu, Junhong and Wang, Chen and Zong, Chengqing and Zhang, Jiajun},
  journal={arXiv preprint arXiv:2410.08964},
  year={2024}
}

Acknowledgement

We would like to thank the following repos for their great work:

License

This project is released under the Apache 2.0 license. Parts of this project contain code and models from other sources, which are subject to their respective licenses.

About

Language Imbalance Driven Rewarding for Multilingual Self-improving

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published