Skip to content

Chain of Rebuttal: Resolving Open-Ended Text Evaluation Problem with Multi-Agent Discussion Framework

Notifications You must be signed in to change notification settings

AnonymousLYZYY/CoR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MATEval

MATEval: A Multi-Agent Text Evaluation Framework

This paper has been ACCEPTED as a LONG PAPER presentation by DASFAA 2024 Industrial Track. You can currently access it through the following link: MATEval

In the Alipay business scenario, we need to assess open-ended story texts generated by large language models(LLMs). For this specific business context, we have proposed a multi-agent evaluation framework called "MATEval". Within this framework, we have integrated strategies of self-reflection and Chain-of-Thought (CoT), and we have also introduced a feedback mechanism at the end of each round of discussion. This mechanism evaluates the quality of each discussion round, facilitating consensus. Ultimately, we require a summarizer to consolidate the results of the entire discussion process. We provide two formats of output: one in the form of Q&A pairs, and the other as text reports that are easy for humans to read. Extensive experiments demonstrate that MATEval's evaluation results on two classic story datasets are more aligned with human preferences compared to existing methods.

image image

In the MATEval framework, we select OpenAI’s GPT-4 as our LLMs due to its outstanding performance and API accessibility. We set the temperature parameter to 0 for result reproducibility. GPT-4’s easy access facilitated effective and coherent multi-agent interactions in our experiments. image

About

Chain of Rebuttal: Resolving Open-Ended Text Evaluation Problem with Multi-Agent Discussion Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%