Do Large Language Models Understand Word Senses?

This repository contains the code and data for the paper "Do Large Language Models Understand Word Senses?" accepted at the EMNLP 2025 main conference.

In our paper, we investigate whether Large Language Models truly understand word senses in context. We evaluate a wide range of models on classic Word Sense Disambiguation benchmarks and novel generative settings, showing that top LLMs match state-of-the-art systems in WSD and achieve up to 98% accuracy in free-form sense explanation tasks.

If you find our paper, code or framework useful, please reference this work in your paper:

@inproceedings{meconi-etal-2025-large,
    title = "Do Large Language Models Understand Word Senses?",
    author = "Meconi, Domenico  and
      Stirpe, Simone  and
      Martelli, Federico  and
      Lavalle, Leonardo  and
      Navigli, Roberto",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1720/",
    pages = "33885--33904",
    ISBN = "979-8-89176-332-6"
}

Setup the environment

Clone the repository:

git clone https://github.com/Babelscape/LLM-WSD.git
cd LLM-WSD

Download the datasets:

pip install gdown
gdown 110XFfCq93zTGQHr65lNsOXzn-KXDFXdb -O LLM-WSD-datasets.zip
unzip LLM-WSD-datasets.zip
rm LLM-WSD-datasets.zip

We suggest to install conda and then use this command to create a new environment:

conda create -n llm-wsd python=3.11

Then, activate the newly-created environment and install the required libraries:

conda activate llm-wsd
pip install -r requirements.txt

Optionally, insert in the .env file your API keys and path if you run on an HPC cluster:

OPENAI_API_KEY=your-openai-key-here  # if you need to evaluate gpt
DEEPSEEK_KEY=your-deepseek-key-here  # if you need to evaluate deepseek
HPC_PATH=path-to-your-hpc-saved-models  # if you run on HPC

Repository Structure

LLM-WSD/
├── data/
│   ├──development/                   # Development datasets
│   └──evaluation/                    # Evaluation datasets
├── src/
│   ├── disambiguate.py               # Main WSD evaluation script
│   ├── score.py                      # Results evaluation and metrics
│   ├── generate_dataset_from_xml.py  # Dataset preprocessing
│   ├── utils.py                      # Core utilities and functions
│   ├── variables.py                  # Model configs and prompt templates
│   └── env.py                        # Environment variable loading
├── .env                              # Environment variables
├── requirements.txt                  # Python dependencies
└── README.md                         # This file

Basic Usage

1. Preprocess your dataset

Convert datasets from a format that follows the one introduced by Raganato et al. (2017) to a JSON format:

python src/generate_dataset_from_xml.py \
    --data_path path/to/your/dataset.data.xml \
    --gold_path path/to/your/dataset.gold.key.txt \ # [Optional] if you have a gold
    --highlight_target \ # [Optional] if you want to highlight the target word
    --shuffle_candidates # [Optional] if you want to create a dataset with candidates in random order

2. Run WSD Experiments

python src/disambiguate.py \
    --subtask selection \
    --approach {zero_shot|one_shot|few_shot|perplexity} \
    --shortcut_model_name model_name #see src/variables.py L22 for the supported models

3. Run Generation Experiments

python src/disambiguate.py \
    --subtask generation \
    --approach zero_shot \
    --shortcut_model_name model_name \ #see src/variables.py L22 for the supported models 
    --prompt_number {1 (Definition Generation)|2 (Free-form Explanation)|3 (Example Generation)}

4. Evaluate WSD Results

python src/score.py \
    --approach {zero_shot|one_shot|few_shot|perplexity} \
    --shortcut_model_name model_name #see src/variables.py L22 for the supported models
    --pos {ALL|NOUN|ADJ|VERB|ADV}

Optional values for points 2, 3 and 4:

--is_devel: Use SemEval-2007 development data
--prompt_number: If is_devel is selected insert a number from 1 to 20
--more_context: Extended context sentences
--shuffle_candidates: Randomized definition order
--hard: Challenging cases (hardEN dataset)
--domain: Domain-specific evaluation (42D dataset)
--custom_dataset_path "/path/to/your/preprocessed/dataset.json" : If you want to test on your custom dataset

License

This work is under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Contributing

We welcome contributions! Please feel free to:

Report bugs and issues
Suggest new features or improvements

For major changes, please open an issue first to discuss what you would like to change.

Contact

For questions about this research, please contact:

Domenico Meconi: meconi@babelscape.com
Roberto Navigli: navigli@babelscape.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Do Large Language Models Understand Word Senses?

Setup the environment

Repository Structure

Basic Usage

1. Preprocess your dataset

2. Run WSD Experiments

3. Run Generation Experiments

4. Evaluate WSD Results

License

Contributing

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
src		src
.env		.env
README.md		README.md
requirements.txt		requirements.txt

Babelscape/LLM-WSD

Folders and files

Latest commit

History

Repository files navigation

Do Large Language Models Understand Word Senses?

Setup the environment

Repository Structure

Basic Usage

1. Preprocess your dataset

2. Run WSD Experiments

3. Run Generation Experiments

4. Evaluate WSD Results

License

Contributing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages