This repository contains Bulgarian ispell (affix and dict) and stopword dictionaries for full text search in PostgreSQL.
The ispell dictionary files (bulgarian.affix and bulgarian.dict) have been created by the bgOffice/БГ Офис project for use in OpenOffice and are licensed under LGPL 3.0.
This repository contains a modified version of those files (minor changes) to make them compatible with the format expected by PostgreSQL. The original ispell files (bulgarian.aff and bulgarian.dic) can be downloaded from http://bgoffice.sourceforge.net/ispell/index.html
The stop words list used in this repository (bulgarian.stop) is a modified version of the list published in article "Searching strategies for the Bulgarian language" (the list is in Table A.1) by Prof. Jacques Savoy.
-
Copy the three files
bulgarian.affix,bulgarian.dictandbulgarian.stopto your$SHAREDIR/tsearch_data/directory (eg.C:\Program Files\PostgreSQL\12\share\tsearch_data). You can determine what your$SHAREDIRis by runningpg_config --sharedir. -
Execute the following SQL script:
CREATE TEXT SEARCH CONFIGURATION bulgarian (COPY = simple); CREATE TEXT SEARCH DICTIONARY bulgarian_ispell ( TEMPLATE = ispell, DictFile = bulgarian, AffFile = bulgarian, StopWords = bulgarian ); CREATE TEXT SEARCH DICTIONARY bulgarian_simple ( TEMPLATE = pg_catalog.simple, STOPWORDS = bulgarian ); ALTER TEXT SEARCH CONFIGURATION bulgarian ALTER MAPPING FOR asciiword, asciihword, hword, hword_part, word WITH bulgarian_ispell, bulgarian_simple; -
Make sure its working by running a full text search query.
A query like this one:
SELECT to_tsvector('bulgarian', 'текстовете');should output only the base of the word (
текст):`"'текст':1"`