Skip to content

Graduate research paper on investigating adversarial attacks on an LLM model and the proposal of methods to enhance robustness and safety of LLMs in production.

Notifications You must be signed in to change notification settings

TylerAnderton/NLP-Final-Paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

This unguided research project served as the final paper for the Natural Language Processing graduate course at UT Austin. My partner and I sought to contribute to the field of large language models by attacking the ELECTRA-small model with adversarial data that mimics a production environment. We then evaluated the errors produced by these adversarial attacks and proposed methods for enhancing the robustness and safety of consumer-facing LLMs.

To learn more, please read through the full report, from which the abstract is displayed below. Unfortunately, as this was an assignment for an active class at the University of Texas, sharing the project files would breach the Academic Honesty agreement, but I hope that the paper includes enough detail that this work could be recreated by an interested party.

Abstract

Question answering is a popular NLP task, driven in part by popular interest in commercializing recent advances in LLMs; however, the excellent performance of these models on common academic QA benchmarks does not always transfer cleanly to industrial contexts (Ribiero et al. 2020). One egregious example of this is when seemingly innocuous changes to the input (e.g a typo or missing word) drastically reduce performance (Gardner et al. 2020). Such model “blind spots” are commonly referred to as dataset artifacts. In this paper we first identify some dataset artifacts that approximate the data imperfections and difficulties that these models might encounter when launched into commercial production. Then we explore methods to mitigate those artifacts during the fine-tuning process of an ELECTRA transformer model on the SQuAD QA benchmark (Clark et al. 2020; Rajpurkar et al. 2016).

About

Graduate research paper on investigating adversarial attacks on an LLM model and the proposal of methods to enhance robustness and safety of LLMs in production.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published