Skip to content

Ganeshpadmanaban/language-models

 
 

Repository files navigation

Language Models

Repository of pre-trained Language Models.

WARNING: a Bidirectional LM model using the MultiFiT configuration is a good model to perform text classification but with only 46 millions of parameters, it is far from being a LM that can compete with GPT-2 or BERT in NLP tasks like text generation. This my next step ;-)

Note: The training times shown in the tables on this page are the sum of the creation time of Fastai Databunch (forward and backward) and the training duration of the bidirectional model over 10 periods. The download time of the Wikipedia corpus and its preparation time are not counted.

Portuguese

I trained 1 Portuguese Bidirectional Language Model (PBLM) with the MultiFit configuration with 1 NVIDIA GPU v100 on GCP.

MultiFiT configuration (architecture 4 QRNN with 1550 hidden parameters by layer / tokenizer SentencePiece (15 000 tokens))

PBLM accuracy perplexity training time
forward 39.68% 21.76 8h
backward 43.67% 22.16 8h

[ WARNING ] The code of this notebook lm3-portuguese-classifier-olist.ipynb must be updated in order to use the SentencePiece model and vocab already trained for the Portuguese Language Model in the notebook lm3-portuguese.ipynb as it was done in the notebook lm3-portuguese-classifier-TCU-jurisprudencia.ipynb (see explanations at the top of this notebook).

Here's an example of using the classifier to predict the category of a TCU legal text:

Using the classifier to predict the category of TCU legal texts

French

I trained 3 French Bidirectional Language Models (FBLM) with 1 NVIDIA GPU v100 on GCP but the best is the one trained with the MultiFit configuration.

French Bidirectional Language Models (FBLM) accuracy perplexity training time
MultiFiT with 4 QRNN + SentencePiece (15 000 tokens) forward 43.77% 16.09 8h40
backward 49.29% 16.58 8h10
ULMFiT with 3 QRNN + SentencePiece (15 000 tokens) forward 40.99% 19.96 5h30
backward 47.19% 19.47 5h30
ULMFiT with 3 AWD-LSTM + spaCy (60 000 tokens) forward 36.44% 25.62 11h
backward 42.65% 27.09 11h

1. MultiFiT configuration (architecture 4 QRNN with 1550 hidden parameters by layer / tokenizer SentencePiece (15 000 tokens))

FBLM accuracy perplexity training time
forward 43.77% 16.09 8h40
backward 49.29% 16.58 8h10

Here's an example of using the classifier to predict the feeling of comments on an amazon product:

Using the classifier to predict the feeling of comments on an amazon product

2. Architecture QRNN / tokenizer SentencePiece

FBLM accuracy perplexity training time
forward 40.99% 19.96 5h30
backward 47.19% 19.47 5h30

3. Architecture AWD-LSTM / tokenizer spaCy

FBLM accuracy perplexity training time
forward 36.44% 25.62 11h
backward 42.65% 27.09 11h

About

pre-trained Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.3%
  • Python 0.7%