Stemex is a NIF wrapper above the snowball language (http://snowball.tartarus.org/index.php).
Add any snowball algorithm in the algorithms
directory, for instance
algorithms/ALGONAME.sbl
containing an external procedure named stem
and
Stemex compiler will:
- Compile the nif shared library
priv/Elixir.Stemex_nif.so
- Add dynamically the Elixir function
Stemex.ALGONAME/1
which takes an UTF8 binary string and returned the result of theALGONAME.sbl
algorithm
It is MEANT to be used in your project with custom stemming snowball algorithms, with a clone of this project. But by default porter stem implementations are included and published in the HEX package.
They are the ones present in the snowball distribution, available stemmers are :
Stemex.danish/1
Stemex.dutch/1
Stemex.english/1
Stemex.finnish/1
Stemex.french/1
Stemex.german/1
Stemex.german2/1
Stemex.hungarian/1
Stemex.italian/1
Stemex.kraaij_pohlmann/1
Stemex.lovins/1
Stemex.norwegian/1
Stemex.portuguese/1
Stemex.romanian/1
Stemex.russian/1
Stemex.spanish/1
Stemex.swedish/1
Stemex.turkish/1
2 compilers are included in the mix.exs :
:stemex_snowball
compilesc_src/gen/ALGONAME.(c|h)
from youralgorithms/ALGONAME.sbl
- it needs to have the
snowball
executable in your PATH, explain you how to get it otherwise - so this compiler is only used on
:dev
Mix env in order to help you developing your snowball algorithms
- it needs to have the
:stemex_nif
compilespriv/Elixir.Stemex_nif.so
used as the nif library.
All files in test/diffs/ALGONAME.txt
must contains one pair of word per line,
every test case will be then tested : Stemex.ALGONAME(first_elem) == second_elem
.
The format is compatible with snowball test files and contains by default all the tests from the snowball website but you can easily add tests for your own snowball algorithm.