Transformer-based language models pre-trained on a large amount of politics-related Twitter data (83M tweets). This repo is the official resource of the following paper.
The data sets for the evaluation tasks presented in our paper are available below.
All models are uploaded to my Huggingface 🤗 so you can load model with just three lines of code!!!
- PoliBERTweet (83M tweets) - Feel free to fine-tune this to any downstream task 🎯
- PoliBERTweet-small (5M tweets)
We tested in pytorch v1.10.2
and transformers v4.18.0
.
- To fine-tune our models for a specific task (e.g. stance detection), see the HuggingFace Doc
- Please see specific model pages above for more usage details. Below is a sample use case.
from transformers import AutoModel, AutoTokenizer, pipeline
import torch
# Choose GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Select mode path here
pretrained_LM_path = "kornosk/polibertweet-mlm"
# Load model
tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path)
model = AutoModel.from_pretrained(pretrained_LM_path)
# Fill mask
example = "Trump is the <mask> of USA"
fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer)
outputs = fill_mask(example)
print(outputs)
# See embeddings
inputs = tokenizer(example, return_tensors="pt")
outputs = model(**inputs)
print(outputs)
# OR you can use this model to train on your downstream task!
# please consider citing our paper if you feel this is useful :)
See details in the HuggingFace Doc.
If you feel our paper and resources are useful, please consider citing our work! 🙏
@inproceedings{kawintiranon2022polibertweet,
title = {{P}oli{BERT}weet: A Pre-trained Language Model for Analyzing Political Content on {T}witter},
author = {Kawintiranon, Kornraphop and Singh, Lisa},
booktitle = {Proceedings of the Language Resources and Evaluation Conference (LREC)},
year = {2022},
pages = {7360--7367},
publisher = {European Language Resources Association},
url = {https://aclanthology.org/2022.lrec-1.801}
}
Create an issue here if you have any issues loading models or data sets.