Skip to content

Language detection module based on the models from GiellaLT

License

Notifications You must be signed in to change notification settings

magbb/gielladetect

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Makes the language classification script from the GiellaLT's corpus tools available as a python module (GiellaLT's website, original repo).

The source code as well as the language model files are released under the GPL-3.0 license.

Installation

pip install gielladetect

Usage

import gielladetect

text = "Lurer du på hva som rører seg innenfor veggene til Nasjonalbiblioteket på Solli plass i Oslo?"

gielladetect.detect(text)
# Result: 'nob'

# To restrict detection to a subset of languages:
gielladetect.detect(text, ['nob', 'nno', 'eng'])
# Result: 'nob'

Supported languages

Using ISO 639-3 codes.

Code Name
ara Arabic
bxr Russia Buriat
ckb Central Kurdish
dan Danish
deu German
eng English
est Estonian
fao Faroese
fas Persian
fin Finnish
fit Tornedalen Finnish
fkv Kven Finnish
fra French
hbs Serbo-Croatian
isl Icelandic
ita Italian
kal Kalaallisut
kmr Northern Kurdish
koi Komi-Permyak
kpv Komi-Zyrian
krl Karelian
mdf Moksha
mhr Eastern Mari
mns Mansi
mrj Western Mari
myv Erzya
nno Norwegian Nynorsk
nob Norwegian Bokmål
olo Livvi
pol Polish
rmf Kalo Finnish Romani
rmn Balkan Romani
rmu Tavringer Romani
rmy Vlax Romani
ron Romanian
rus Russian
sma Southern Sami
sme Northern Sami
smj Lule Sami
smn Inari Sami
sms Skolt Sami
som Somali
spa Spanish
swe Swedish
tur Turkish
udm Udmurt
urd Urdu
vep Veps
vie Vietnamese
yid Yiddish
yrk Nenets

About

Language detection module based on the models from GiellaLT

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%