Skip to content

Codes for paper: ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

License

Notifications You must be signed in to change notification settings

alirezasalemi7/ExPerT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

This repository contains the codes and packages for the paper titled ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation.

Evaluating personalized text generated by large language models (LLMs) is challenging, as only the LLM user, i.e. prompt author, can reliably assess the output, but re-engaging the same individuals across studies is infeasible. This paper addresses the challenge of evaluating personalized text generation by introducing ExPerT, an explainable reference-based evaluation framework. ExPerT leverages an LLM to extract atomic aspects and their evidences from the generated and reference texts, match the aspects, and evaluate their alignment based on content and writing style—two key attributes in personalized text generation. Additionally, ExPerT generates detailed, fine-grained explanations for every step of the evaluation process, enhancing transparency and interpretability. Our experiments demonstrate that ExPerT achieves a 7.2% relative improvement in alignment with human judgments compared to the state-of-the-art text generation evaluation methods. Furthermore, human evaluators rated the usability of ExPerT's explanations at 4.7 out of 5, highlighting its effectiveness in making evaluation decisions more interpretable.

Installation

You can install ExPerT using the following pip command:

pip install expert-score==0.0.1

Usage

Using ExPerT is as simple as:

import expert_score

score = expert_score.expert(
    inputs = [...],  # A list of input strings from the users
    outputs = [...],  # A list of generated outputs by a model for the users
    references = [...],  # A list of reference outputs for the users
    model_name = "google/gemma-2-27b-it",  # The name of the LLM to be used as ExPerT's backbone
    cache_dir = "/path/to/cache/dir",  # The cache directory
    max_generated_output_length = 512,  # Maximum number of tokens to consider from the generated outputs
    max_evaluator_length = 8192,  # Maximum number of tokens that can be used by ExPerT's LLM backbone
    max_retries = 100,  # Maximum retries before failure for out-of-format generated outputs by ExPerT's LLM backbone
    ignore_on_fail = True,  # Ignore single aspects if the model fails to generate well-formatted output (rare occurrence)
    google_llm = False,  # If you want to use a Google-based LLM API
    openai_llm = False,  # If you want to use an OpenAI-based LLM API
    api_key = "api/key",  # The API key for LLM API if using OpenAI or Google LLM API
)

Examples

You can see an example of ExPerT's evaluation in this notebook.

Reference

ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

@misc{salemi2025experteffectiveexplainableevaluation,
      title={ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation}, 
      author={Alireza Salemi and Julian Killingback and Hamed Zamani},
      year={2025},
      eprint={2501.14956},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.14956}, 
}

Acknowledgment

This work was supported in part by the Center for Intelligent Information Retrieval, in part by the NSF Graduate Research Fellowships Program (GRFP) Award #1938059, in part by Google, and in part by Microsoft. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsor.

About

Codes for paper: ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages