Skip to content

Commit

Permalink
update paper
Browse files Browse the repository at this point in the history
  • Loading branch information
Plaza committed May 8, 2024
1 parent 7a19d13 commit cdf276a
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion content/publication/2023-label-variation-llms/cite.bib
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
@misc{plazadelarco2023leveraging,
title={Leveraging Label Variation in Large Language Models for Zero-Shot Text Classification},
title={Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation},
author={Flor Miriam Plaza-del-Arco and Debora Nozza and Dirk Hovy},
year={2023},
eprint={2307.12973},
Expand Down
6 changes: 3 additions & 3 deletions content/publication/2023-label-variation-llms/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
# Documentation: https://sourcethemes.com/academic/docs/managing-content/

title: "Leveraging Label Variation in Large Language Models for Zero-Shot TextClassification"
title: "Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation"
authors: ["Flor Miriam Plaza-del-Arco","Debora Nozza","Dirk Hovy"]
date: 2023-07-24
doi: ""
Expand All @@ -19,13 +19,13 @@ publication_types: ["3"]
publication: "arXiv preprint arXiv:2307.12973"
publication_short: "arXiv preprint arXiv:2307.12973"

abstract: "The zero-shot learning capabilities of large language models (LLMs) make them ideal for text classification without annotation or supervised training. Many studies have shown impressive results across multiple tasks. While tasks, data, and results differ widely, their similarities to human annotation can aid us in tackling new tasks with minimal expenses. We evaluate using 5 state-of-the-art LLMs as annotators on 5 different tasks (age, gender, topic, sentiment prediction, and hate speech detection), across 4 languages: English, French, German, and Spanish. No single model excels at all tasks, across languages, or across all labels within a task. However, aggregation techniques designed for human annotators perform substantially better than any one individual model. Overall, though, LLMs do not rival even simple supervised models, so they do not (yet) replace the need for human annotation. We also discuss the tradeoffs between speed, accuracy, cost, and bias when it comes to aggregated model labeling versus human annotation."
abstract: "Large Language Models (LLMs) exhibit remarkable text classification capabilities, excelling in zero- and few-shot learning (ZSL and FSL) scenarios. However, since they are trained on different datasets, performance varies widely across tasks between those models. Recent studies emphasize the importance of considering human label variation in data annotation. However, how this human label variation also applies to LLMs remains unexplored. Given this likely model specialization, we ask: Do aggregate LLM labels improve over individual models (as for human annotators)? We evaluate four recent instruction-tuned LLMs as annotators on five subjective tasks across four languages. We use ZSL and FSL setups and label aggregation from human annotation. Aggregations are indeed substantially better than any individual model, benefiting from specialization in diverse tasks or languages. Surprisingly, FSL does not surpass ZSL, as it depends on the quality of the selected examples. However, there seems to be no good information-theoretical strategy to select those. We find that no LLM method rivals even simple supervised models. We also discuss the tradeoffs in accuracy, cost, and moral/ethical considerations between LLM and human annotation."

# Summary. An optional shortened abstract.
summary: ""


tags: ["NLP","LLMs"]
tags: ["NLP","LLMs","annotation"]
categories: []
featured: false

Expand Down

0 comments on commit cdf276a

Please sign in to comment.