Python wrapper for Majka

What is Majka

Majka is a linguistics tool for morphology analysis.

Depending on passed dictionaries, the searched word form is translated to a lemma (basic form) and tags determining the linguistic properties of the word or other way around.

For what can be this module used?

The lemmatization of documents is heavily used in text processing, as it simplifies the processing of text in inflective languages.

An example from Czech: 'dělala' (she did) is transformed to 'dělat' (do) and tags determining the past tense, female gender and others are added.

Tags returned from the analyzer that comply with new tagset reference (for example cs, sk) are transcribed into a native Python dictionary to enable a much more Python-like experience without a need to study the documentation. Other or falsely recognized are stored in entry 'other'.

Install / Build instructions

Module is available in PyPi, use pip install majka to install.

For local build / install use:

./setup.py build
./setup.py install

No dependencies outside standard Python and C++ build environment should be needed. (gcc, python-dev, etc.)

Usage

Majka requires a morphological database (automaton) to work. See https://nlp.fi.muni.cz/ma/ for a list of available databases.

import majka
morph = majka.Majka('path/to/database')

morph.flags |= majka.ADD_DIACRITICS  # find word forms with diacritics
morph.flags |= majka.DISALLOW_LOWERCASE  # do not enable to find lowercase variants
morph.flags |= majka.IGNORE_CASE  # ignore the word case whatsoever
morph.flags = 0  # unset all flags

morph.tags = False  # return just the lemma, do not process the tags
morph.tags = True  # turn tag processing back on (default)

morph.compact_tag = True  # return tag in compact form (as returned by Majka)
morph.compact_tag = False  # do not return compact tag (default)

morph.first_only = True  # return only the first entry
morph.first_only = False  # return all entries (default)

morph.find('nejnevhodnější')

Returns

[{'lemma': 'vhodný',
  'tags': {'case': 1,
           'gender': 'feminine',
           'negation': True,
           'plural': True,
           'pos': 'adjective',
           'degree': 3}
 },
...
]

Note on tag translation

Currently, the tag translation to a Python dictionary works only for databases following the Czech and Slovak tag reference. Other languages may return untranslated tags in field other.

Usage with negations

.tags = False causes a transformation of the negation into the lemma itself. By default, "-" sign is prepended, but value can be changed by setting the .negative value.

morph.tags = False
morph.first_only = True
morph.negative = "ne"

morph.find('nejnevhodnější')

Returns

[{'lemma': 'nevhodný'}]

Attributions

The module is based on code of Pavel Smerk and Pavel Rychly, NLP group at MUNI, Czech Republic.

Original majka binary is in majka/majka_bin.cc, see majka/Makefile for build.

Thanks

Tomáš Karabela (@tkarabela) for discovering the memory leaks

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
majka		majka
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
majkamodule.cpp		majkamodule.cpp
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python wrapper for Majka

What is Majka

For what can be this module used?

Install / Build instructions

Usage

Returns

Note on tag translation

Usage with negations

Returns

Attributions

Thanks

About

Releases

Packages

Contributors 2

Languages

License

petrpulc/python-majka

Folders and files

Latest commit

History

Repository files navigation

Python wrapper for Majka

What is Majka

For what can be this module used?

Install / Build instructions

Usage

Returns

Note on tag translation

Usage with negations

Returns

Attributions

Thanks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages