Skip to content

This repository contains the code and data for the paper "Do Large Language Models Understand Word Senses?" accepted at the EMNLP 2025 main conference.

Notifications You must be signed in to change notification settings

Babelscape/LLM-WSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Do Large Language Models Understand Word Senses?

Conference Paper

i

This repository contains the code and data for the paper "Do Large Language Models Understand Word Senses?" accepted at the EMNLP 2025 main conference.

In our paper, we investigate whether Large Language Models truly understand word senses in context. We evaluate a wide range of models on classic Word Sense Disambiguation benchmarks and novel generative settings, showing that top LLMs match state-of-the-art systems in WSD and achieve up to 98% accuracy in free-form sense explanation tasks.

If you find our paper, code or framework useful, please reference this work in your paper:

@inproceedings{meconi-etal-2025-large,
    title = "Do Large Language Models Understand Word Senses?",
    author = "Meconi, Domenico  and
      Stirpe, Simone  and
      Martelli, Federico  and
      Lavalle, Leonardo  and
      Navigli, Roberto",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1720/",
    pages = "33885--33904",
    ISBN = "979-8-89176-332-6"
}


Setup the environment

Clone the repository:

git clone https://github.com/Babelscape/LLM-WSD.git
cd LLM-WSD

Download the datasets:

pip install gdown
gdown 110XFfCq93zTGQHr65lNsOXzn-KXDFXdb -O LLM-WSD-datasets.zip
unzip LLM-WSD-datasets.zip
rm LLM-WSD-datasets.zip

We suggest to install conda and then use this command to create a new environment:

conda create -n llm-wsd python=3.11

Then, activate the newly-created environment and install the required libraries:

conda activate llm-wsd
pip install -r requirements.txt

Optionally, insert in the .env file your API keys and path if you run on an HPC cluster:

OPENAI_API_KEY=your-openai-key-here  # if you need to evaluate gpt
DEEPSEEK_KEY=your-deepseek-key-here  # if you need to evaluate deepseek
HPC_PATH=path-to-your-hpc-saved-models  # if you run on HPC

Repository Structure

LLM-WSD/
├── data/
│   ├──development/                   # Development datasets
│   └──evaluation/                    # Evaluation datasets
├── src/
│   ├── disambiguate.py               # Main WSD evaluation script
│   ├── score.py                      # Results evaluation and metrics
│   ├── generate_dataset_from_xml.py  # Dataset preprocessing
│   ├── utils.py                      # Core utilities and functions
│   ├── variables.py                  # Model configs and prompt templates
│   └── env.py                        # Environment variable loading
├── .env                              # Environment variables
├── requirements.txt                  # Python dependencies
└── README.md                         # This file

Basic Usage

1. Preprocess your dataset

Convert datasets from a format that follows the one introduced by Raganato et al. (2017) to a JSON format:

python src/generate_dataset_from_xml.py \
    --data_path path/to/your/dataset.data.xml \
    --gold_path path/to/your/dataset.gold.key.txt \ # [Optional] if you have a gold
    --highlight_target \ # [Optional] if you want to highlight the target word
    --shuffle_candidates # [Optional] if you want to create a dataset with candidates in random order

2. Run WSD Experiments

python src/disambiguate.py \
    --subtask selection \
    --approach {zero_shot|one_shot|few_shot|perplexity} \
    --shortcut_model_name model_name #see src/variables.py L22 for the supported models

3. Run Generation Experiments

python src/disambiguate.py \
    --subtask generation \
    --approach zero_shot \
    --shortcut_model_name model_name \ #see src/variables.py L22 for the supported models 
    --prompt_number {1 (Definition Generation)|2 (Free-form Explanation)|3 (Example Generation)}

4. Evaluate WSD Results

python src/score.py \
    --approach {zero_shot|one_shot|few_shot|perplexity} \
    --shortcut_model_name model_name #see src/variables.py L22 for the supported models
    --pos {ALL|NOUN|ADJ|VERB|ADV}

Optional values for points 2, 3 and 4:

  • --is_devel: Use SemEval-2007 development data
  • --prompt_number: If is_devel is selected insert a number from 1 to 20
  • --more_context: Extended context sentences
  • --shuffle_candidates: Randomized definition order
  • --hard: Challenging cases (hardEN dataset)
  • --domain: Domain-specific evaluation (42D dataset)
  • --custom_dataset_path "/path/to/your/preprocessed/dataset.json" : If you want to test on your custom dataset

License

This work is under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.


Contributing

We welcome contributions! Please feel free to:

  • Report bugs and issues
  • Suggest new features or improvements

For major changes, please open an issue first to discuss what you would like to change.


Contact

For questions about this research, please contact:

About

This repository contains the code and data for the paper "Do Large Language Models Understand Word Senses?" accepted at the EMNLP 2025 main conference.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages