Croatian language stemmer in C, and a PostgreSQL dictionary extension

This project is based on the work published at http://nlp.ffzg.hr/resources/tools/stemmer-for-croatian/ . If you use this project in your own work, you should mention (for example in a help message or in an "about" window) both projects.

Requirements

pcre, available in Debian/Ubuntu as libpcre3-dev

PostgreSQL stemmer module requirements

PostgreSQL development libraries, e.g. postgresql-server-dev-9.5

See build instructions in the pgdict_hrstemc folder.

Usage example in C

All strings are assumed to be UTF-8.

#include <stdio.h>
#include <string.h>

#include "hrstemc.h"

int main(int argc, char **argv) {
    hrstemc_init();

    printf("koza: %d\n", hrstemc_is_stopword("koza"));
    printf("mora: %d\n", hrstemc_is_stopword("mora"));
    printf("krenuti: %d\n", hrstemc_is_stopword("krenuti"));

    char *s2 = hrstemc_korjenuj("tihim");
    printf("korjenuj: %s\n", s2);
    free(s2);
}

The main function is hrstemc_korjenuj() defined as:

char* hrstemc_korjenuj(char* lowercase_word);

The function always returns a malloc()-ed string which needs to be free()-d by the caller.

Usage example for PostgreSQL

See the README.build file in the pgdict_hrstemc directory for build and usage instructions.

Endnotes

There are several places in the code which could be optimized for better performance, e.g. searching for stopwords could be done as a binary search on a sorted list, transformation processing could be done with a fancy algorithm, etc. Patches welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
pgdict_hrstemc		pgdict_hrstemc
py		py
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
comprules.py		comprules.py
comptransformations.py		comptransformations.py
hrstemc.c		hrstemc.c
hrstemc.h		hrstemc.h
rules.c		rules.c
rules.txt		rules.txt
test.c		test.c
transformations.c		transformations.c
transformations.txt		transformations.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Croatian language stemmer in C, and a PostgreSQL dictionary extension

Requirements

PostgreSQL stemmer module requirements

Usage example in C

Usage example for PostgreSQL

Endnotes

About

Releases 1

Packages

Languages

License

ivoras/hrstemc

Folders and files

Latest commit

History

Repository files navigation

Croatian language stemmer in C, and a PostgreSQL dictionary extension

Requirements

PostgreSQL stemmer module requirements

Usage example in C

Usage example for PostgreSQL

Endnotes

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages