Skip to content

Python implementation of multiple text as data methods and benchmarking on publicly available datasets.

Notifications You must be signed in to change notification settings

yabramuvdi/text-algorithms-benchmarking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text-algorithms-benchmarking

Python implementation of multiple text as data methods and benchmarking on publicly available datasets. Below a brief description of the data and the main methods demonstrated.

Data

Economic Policy Uncertainty

Methods

1. Dictionary methods

For gigantic corpora and interesting alternative to the implementation from this repository is flashtext

2. Logistic regression on a bag-of-words

3. Small language models (i.e. BERT and friends)

4. Large language models

Zero-shot and no fine-tuning.

5. Finetune open-source large language models

Llama 3

About

Python implementation of multiple text as data methods and benchmarking on publicly available datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published