Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

This is the github repo for our COLM 2024 paper "Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness" (Link).

In this paper, we propose a novel retrieval benchmark, PIR, to study if and how current retrievers can handle nuanced perspective changes in user queries from real-world scenarios. For example, when asked to verify a claim, a retrieval system is expected to identify evidence from both supporting vs. contradicting perspectives, for the downstream system to make a fair judgment call.

Along with the dataset, we propose PAP as a simple and effective method to improve the perspective awareness of current retrievers with minimum change in the retrieval pipeline.

Demo Data Format

Check our dataset format in mini-datasets at ./demo_pir_dataset for a demo version of all the tasks included in our experiments. Do send an email if you want to replicate the full analysis in the paper. For the full-sized dataset, check the Hugginface version described in the next paragraph.

tasks: a string indicating which task the retrieval data belongs to (perspectrum, agnews, story, ambigqa, allsides, exfever). 

    queries: the queries for the retrieval, with the perspectives

    source_queries: a list of the root queries of the queries, corresponding to the queries with the same index

    perspectives: a list of the perspectives of the queries, corresponding to the queries with the same index

    key_ref: a map from the query number (string) to the retrieval target index in the corpus

    query_labels: a list of labels showing the query characteristics, for evaluation purposes

    corpus: a list of the retrieval targets

Dataset

Check our updated dataset at Hugging Face with another round of manual filtering: https://huggingface.co/trumancai/ + perspective-information-retrieval-{dataset}

dataset = story, perspectrum, exfever, ambigqa, agnews, allsides

You can check ./process_hugginface_version.py to process and experiment with the hugginface version.

For the performance of various cutting-edge retrievers (e.g., RepLlama, Gecko, GritLM, Promptriever, and Voyage AI Embeddings) on the Hugginface version of PIR, please check the benchmarking results here.

Welcome to send the author an email for the performance of state-of-the-art retrievers or further discussion.

We sincerely acknowledge Haoyang Wen, Hanjun Dai, and Fengyu Cai for further advice: (1) some questions in the AGNews dataset are hard for humans; (2) some corpus entries in the AmbigQA dataset are too short to provide full information.

Experiments

We provide the code for the main experiments in ./demo.ipynb. You can run the Jupyter Notebook to replicate our experiments with various retrievers and our projection-based methods.

Others

If you have any other questions about this repo, you are welcome to open an issue or send me an email, I will respond to that as soon as possible.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
demo_pir_dataset.json		demo_pir_dataset.json
process_hugginface_version.py		process_hugginface_version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

Demo Data Format

Dataset

Experiments

Others

About

Uh oh!

Releases

Packages

Languages

License

colinzhaoust/pir

Folders and files

Latest commit

History

Repository files navigation

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

Demo Data Format

Dataset

Experiments

Others

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages