ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

This repository contains the codes and packages for the paper titled ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation.

Evaluating personalized text generated by large language models (LLMs) is challenging, as only the LLM user, i.e. prompt author, can reliably assess the output, but re-engaging the same individuals across studies is infeasible. This paper addresses the challenge of evaluating personalized text generation by introducing ExPerT, an explainable reference-based evaluation framework. ExPerT leverages an LLM to extract atomic aspects and their evidences from the generated and reference texts, match the aspects, and evaluate their alignment based on content and writing style—two key attributes in personalized text generation. Additionally, ExPerT generates detailed, fine-grained explanations for every step of the evaluation process, enhancing transparency and interpretability. Our experiments demonstrate that ExPerT achieves a 7.2% relative improvement in alignment with human judgments compared to the state-of-the-art text generation evaluation methods. Furthermore, human evaluators rated the usability of ExPerT's explanations at 4.7 out of 5, highlighting its effectiveness in making evaluation decisions more interpretable.

Installation

You can install ExPerT using the following pip command:

pip install expert-score==0.0.1

Usage

Using ExPerT is as simple as:

import expert_score

score = expert_score.expert(
    inputs = [...],  # A list of input strings from the users
    outputs = [...],  # A list of generated outputs by a model for the users
    references = [...],  # A list of reference outputs for the users
    model_name = "google/gemma-2-27b-it",  # The name of the LLM to be used as ExPerT's backbone
    cache_dir = "/path/to/cache/dir",  # The cache directory
    max_generated_output_length = 512,  # Maximum number of tokens to consider from the generated outputs
    max_evaluator_length = 8192,  # Maximum number of tokens that can be used by ExPerT's LLM backbone
    max_retries = 100,  # Maximum retries before failure for out-of-format generated outputs by ExPerT's LLM backbone
    ignore_on_fail = True,  # Ignore single aspects if the model fails to generate well-formatted output (rare occurrence)
    google_llm = False,  # If you want to use a Google-based LLM API
    openai_llm = False,  # If you want to use an OpenAI-based LLM API
    api_key = "api/key",  # The API key for LLM API if using OpenAI or Google LLM API
)

Examples

You can see an example of ExPerT's evaluation in this notebook.

Reference

ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

@misc{salemi2025experteffectiveexplainableevaluation,
      title={ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation}, 
      author={Alireza Salemi and Julian Killingback and Hamed Zamani},
      year={2025},
      eprint={2501.14956},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.14956}, 
}

Acknowledgment

This work was supported in part by the Center for Intelligent Information Retrieval, in part by the NSF Graduate Research Fellowships Program (GRFP) Award #1938059, in part by Google, and in part by Microsoft. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsor.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
expert_score		expert_score
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

Installation

Usage

Examples

Reference

Acknowledgment

About

Releases

Packages

Languages

License

alirezasalemi7/ExPerT

Folders and files

Latest commit

History

Repository files navigation

ExPerT: Effective and Explainable Evaluation of Personalized Long-Form Text Generation

Installation

Usage

Examples

Reference

Acknowledgment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages