MATEval

MATEval: A Multi-Agent Text Evaluation Framework

This paper has been ACCEPTED as a LONG PAPER presentation by DASFAA 2024 Industrial Track. You can currently access it through the following link: MATEval

In the Alipay business scenario, we need to assess open-ended story texts generated by large language models（LLMs）. For this specific business context, we have proposed a multi-agent evaluation framework called "MATEval". Within this framework, we have integrated strategies of self-reflection and Chain-of-Thought (CoT), and we have also introduced a feedback mechanism at the end of each round of discussion. This mechanism evaluates the quality of each discussion round, facilitating consensus. Ultimately, we require a summarizer to consolidate the results of the entire discussion process. We provide two formats of output: one in the form of Q&A pairs, and the other as text reports that are easy for humans to read. Extensive experiments demonstrate that MATEval's evaluation results on two classic story datasets are more aligned with human preferences compared to existing methods.

In the MATEval framework, we select OpenAI’s GPT-4 as our LLMs due to its outstanding performance and API accessibility. We set the temperature parameter to 0 for result reproducibility. GPT-4’s easy access facilitated effective and coherent multi-agent interactions in our experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
LOT		LOT
ROC		ROC
WP		WP
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MATEval

About

Uh oh!

Releases

Packages

Languages

AnonymousLYZYY/CoR

Folders and files

Latest commit

History

Repository files navigation

MATEval

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages