This repository contains State of the Art Language models and Classifier for Kannada, which is spoken predominantly by Kannada people in India, mainly in the state of Karnataka.
The models trained here have been used in Natural Language Toolkit for Indic Languages (iNLTK)
Architecture/Dataset | Kannada Wikipedia Articles |
---|---|
ULMFiT | 70.10 |
TransformerXL | 61.97 |
Dataset | Accuracy | MCC | Notebook to Reproduce results |
---|---|---|---|
IndicNLP News Article Classification Dataset - Kannada | 98.87 | 98.30 | Link |
Architecture | Visualization |
---|---|
ULMFiT | Embeddings projection |
TransformerXL | Embeddings projection |
Download pretrained Language Model from here
Trained tokenizer using Google's sentencepiece
Download the trained model and vocabulary from here