NLU_S22_Project

Repository for NLU Spring 22 Final Project - Kanika Agarwal (ka2522), Maitreya Sonawane (mss9240), Nishanth Sanjeev (ns5287), Sumit Mamtani (sm9669) The repository contains baseline results for 2 datasets:

Twiiter Sarcasm Dataset - found in folder Reddit_Twitter_Dataset/twitter. It contains .jsonl files for training and testing
News Headline Sarcasm Dataset - found in folder News_dataset. It contains two files, Sarcasm_Headlines_Dataset.json and Sarcasm_Headlines_Dataset_v2.json, out of which the latter contains more sarcastic headlines and is conveniently balanced for our use case.

The baseline models chosen to evaluate was 'T5-base', a finetuned version on Twitter Sarcasm Dataset can be found on https://huggingface.co/mrm8488/t5-base-finetuned-sarcasm-twitter, and hence the same was used for testing on the Twitter Sarcasm Dataset. For News Dataset, we had to finetune the T5-base model ourselves and then test it against the test set. The model weights for the same can be found here: https://drive.google.com/file/d/1gnp1zv2t4xkYcRqmCPbaDjfgmyg8cw6Q/view?usp=sharing

In comparison, we have experimented on the following token-free models until now:

CANINE - finetuned and tested on the Twitter Sarcasm Dataset. The model weights can be found here: https://drive.google.com/file/d/1N0yQ2do5OqbgzHxP68Ikpyi0mDd2kNYA/view?usp=sharing
CANINE - finetuned and tested on the News Dataset. The model weights can be found in the following zip file: https://drive.google.com/file/d/11DREflBk89GdhoG5fQAYoStxOR2RNUhH/view?usp=sharing
ByT5-small model- finetuned and tested on the Twitter Sarcasm Dataset. The model weights can be found here: https://drive.google.com/file/d/117QtlzOmz3QpZAliWyC8KeUkV5MolVeM/view?usp=sharing
ByT5-small model- finetuned and tested on the News Headline Sarcasm Dataset. The model weights can be found here: https://drive.google.com/file/d/1riiKnmi8XFcSSokJvrKwmFgRDDlEr4vY/view?usp=sharing
ByT5-base model- finetuned and tested on the News Headline Sarcasm Dataset. The model weights can be found here: https://drive.google.com/file/d/1-vAHkwUX0LNiZ-ymKtk9gFRsdfJNWWDk/view?usp=sharing
ByT5-base model - finetuned and tested on the Twitter Sarcasm dataset. The Jupyter Notebook and model weights can be found here: https://drive.google.com/file/d/1MxXZZmM_yG0D5fVfxgsZxtRoEp0Aebvc/view?usp=sharing
Charformer model- Trained and tested on Twitter Sarcasm dataset. The model weights can be found here: https://drive.google.com/file/d/17tssC1wV_bxlVhz6v5TBT4MYPRRkNppb/view?usp=sharing
Charformer model - Trained and tested on News Headline Sarcasm dataset. The model weights can be found here: https://drive.google.com/file/d/1vhDF3kKtSbcYGQLO9XH4iMAMy5R8fv6F/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
News_dataset		News_dataset
Reddit_Twitter_Dataset		Reddit_Twitter_Dataset
Byt5_base_News_Headline.ipynb		Byt5_base_News_Headline.ipynb
Byt5_small_News_Headline.ipynb		Byt5_small_News_Headline.ipynb
Byt5_small_Twitter_sarcasm.ipynb		Byt5_small_Twitter_sarcasm.ipynb
CANINE_Twitter_finetune_test.ipynb		CANINE_Twitter_finetune_test.ipynb
CANINE_news_finetune_test.ipynb		CANINE_news_finetune_test.ipynb
Charformer_Twitter_sarcasm.ipynb		Charformer_Twitter_sarcasm.ipynb
README.md		README.md
T5_news_finetune_test.ipynb		T5_news_finetune_test.ipynb
T5_twitter_sarcasm_testing.ipynb		T5_twitter_sarcasm_testing.ipynb
byt5base_TwitterSarcasm.ipynb		byt5base_TwitterSarcasm.ipynb
charformer_News_Headline.ipynb		charformer_News_Headline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLU_S22_Project

About

Releases

Packages

Contributors 4

Languages

sonawanemaitreya/NLU_S22_Project

Folders and files

Latest commit

History

Repository files navigation

NLU_S22_Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages