Skip to content
forked from Andrews2017/kkltk

The Kinyarwanda and Kirundi Languages Toolkit (KKLTK) is a Python package for Kinyarwanda and Kirundi languages processing. KKLTK currently provides the sets of stopwords for both languages and other preprocessing tools such as Kinyarwanda and Kirundi tokenizers will be added soon. KKLTK requires Python 3.0, 3.5, 3.6, 3.7, or 3.8.

License

Notifications You must be signed in to change notification settings

takenolab/kkltk

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

KKLTK: Kinyarwanda and Kirundi Languages ToolKit

KKLTK is a Python package for Kinyarwanda and Kirundi languages processing. KKLTK currently provides the sets of stopwords for both languages and other preprocessing tools such as Kinyarwanda and Kirundi tokenizers will be added soon. KKLTK requires Python 3.0, 3.5, 3.6, 3.7, or 3.8.

For more details information on how these stopwords were obtained, please refer to the paper to appear in COLING 2020 titled "KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and Kirundi" by Rubungo Andre Niyongabo, Hong Qu, Julia Kreutzer, and Li Huang.

Installation

pip install kkltk==1.0

Usage

Stopwords

from kkltk.kin_kir_stopwords import stopwords

# Kinyarwanda
stopset_kin = stopwords.words('kinyarwanda')

# Kirundi
stopset_kir = stopwords.words('kirundi')

Contributing

KKLTK is the beginning step of putting under-represented languages on the NLP map. The provided stopwords lists on both languages are still growing. Please, kindly reach out to me for any contribution you may wish to provide.

About

The Kinyarwanda and Kirundi Languages Toolkit (KKLTK) is a Python package for Kinyarwanda and Kirundi languages processing. KKLTK currently provides the sets of stopwords for both languages and other preprocessing tools such as Kinyarwanda and Kirundi tokenizers will be added soon. KKLTK requires Python 3.0, 3.5, 3.6, 3.7, or 3.8.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%