Lexupdater

Lexupdater is a tool to extend and update the NST pronunciation lexicon with new words and dialect variation in the pronunciation transcriptions.

The dialectal variation is updated through string transformation rules (search-and-replace with rege patterns) developed by trained linguists in the Language Bank at the National Library of Norway.

Since NST was first published before 2000, new words occurring after 2000 have been added from the corpora Norwegian Newspaper Corpus Bokmål and Målfrid 2021 – Freely Available Documents from Norwegian State Institutions.

Usage

1. Install lexupdater

Enure you have python version 3.8 or higher.

Create a virtual environment and activate it, e.g.

python -m venv .venv
source .venv/bin/activate

Install lexupdater:

pip install git+https://github.com/Sprakbanken/lexupdater.git@v0.7.6

2. Download data

The NST pronunciation lexion is availalbe in an SQLite database nst_lexicon_bm.db. It has a table with words, and with pronunciations (base).

Lexupdater uses external python files with dicts of regex patterns to update the database, and a csv-file to add new words. These files are available from the nb_uttale-repo.

Linux

./fetch_data.sh

Other OS (Windows, Mac)

Download the pronunciation database by clicking this link: https://www.nb.no/sbfil/uttaleleksikon/nst_lexicon_bm.db
Use git commands to fetch the rules and newwords from nb_uttale:

git remote add nb_uttale git@github.com:Sprakbanken/nb_uttale.git
git fetch nb_uttale
git show nb_uttale/main:data/input/rules_v1.py > rules.py
git show nb_uttale/main:data/input/exemptions_v1.py > exemptions.py
git show nb_uttale/main:data/input/newwords_2022.csv > newwords.csv
git remote remove nb_uttale

3. Add new words to the lexicon

Run lexupdater newwords from your command line.

4. Generate dialect variations

Run lexupdater update from the command line.

The update command and the default settings correspond to the following:

lexupdater -v \
    --database "nst_lexicon_bm.db" \
    --newwords-path "newwords.csv" \
    --dialects e_spoken \
        -d e_written \
        -d sw_spoken \
        -d sw_written \
        -d w_spoken \
        -d w_written \
        -d t_spoken \
        -d t_written \
        -d n_spoken \
        -d n_written \
    update \
        --rules-file "rules.py" \
        --exemptions-file "exemptions.py" \
        --output-dir "data/output"

Configure Lexupdater

The parameters database, output_dir, newwords_path, dialects and the update-parameters rules_file and exemptions_file can be changed in your local config.py.

You can also set the parameters directly from the command line. See the help flag for more info:

lexupdater -h

Developers

Build the `lexupdater` python package yourself

We use pyproject.toml to configure the package.

python -m build .

The python distribution wheel is located in the dist-folder. It can be intsalled with pip:

pip install dist/lexupdater-*.whl      # OS-independent

Name		Name	Last commit message	Last commit date
Latest commit History 406 Commits
.github/workflows		.github/workflows
lexupdater		lexupdater
tests		tests
.gitignore		.gitignore
LESMEG.md		LESMEG.md
LICENSE		LICENSE
README.md		README.md
config.py		config.py
fetch_data.sh		fetch_data.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexupdater

Usage

1. Install lexupdater

2. Download data

Linux

Other OS (Windows, Mac)

3. Add new words to the lexicon

4. Generate dialect variations

Configure Lexupdater

Developers

Build the `lexupdater` python package yourself

About

Releases 9

Packages

Contributors 2

Languages

License

Sprakbanken/lexupdater

Folders and files

Latest commit

History

Repository files navigation

Lexupdater

Usage

1. Install lexupdater

2. Download data

Linux

Other OS (Windows, Mac)

3. Add new words to the lexicon

4. Generate dialect variations

Configure Lexupdater

Developers

Build the lexupdater python package yourself

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Languages

Build the `lexupdater` python package yourself

Packages