Client library for communicating with LaBB-CAT servers using Python.
e.g.
import labbcat
# Connect to the LaBB-CAT corpus
corpus = labbcat.LabbcatView("https://labbcat.canterbury.ac.nz/demo", "demo", "demo")
# Find all tokens of a word
matches = corpus.getMatches({"orthography":"quake"})
# Get the recording of that utterance
audio = corpus.getSoundFragments(matches)
# Get Praat TextGrids for the utterances
textgrids = corpus.getFragments(
matches, ["utterance", "word","segment"],
"text/praat-textgrid")
LaBB-CAT is a web-based linguistic annotation store that stores audio or video recordings, text transcripts, and other annotations.
Annotations of various types can be automatically generated or manually added.
LaBB-CAT servers are usually password-protected linguistic corpora, and can be accessed manually via a web browser, or programmatically using a client library like this one.
The current version of this library requires LaBB-CAT version 20220307.1126.
API documentation is available at https://nzilbb.github.io/labbcat-py/
nzilbb-labbcat is available in the Python Package Index here
To install the module:
pip install nzilbb-labbcat
The following example shows how to:
- upload a transcript to LaBB-CAT,
- wait for the automatic annotation tasks to finish,
- extract the annotation labels, and
- delete the transcript from LaBB-CAT.
import labbcat
# Connect to the LaBB-CAT corpus
corpus = labbcat.LabbcatEdit("http://localhost:8080/labbcat", "labbcat", "labbcat")
# List the corpora on the server
corpora = corpus.getCorpusIds()
# List the transcript types
transcript_type_layer = corpus.getLayer("transcript_type")
transcript_types = transcript_type_layer["validLabels"]
# Upload a transcript
corpus_id = corpora[0]
transcript_type = next(iter(transcript_types))
taskId = corpus.newTranscript(
"test/labbcat-py.test.txt", None, None, transcript_type, corpus_id, "test")
# wait for the annotation generation to finish
corpus.waitForTask(taskId)
corpus.releaseTask(taskId)
# get the "POS" layer annotations
annotations = corpus.getAnnotations("labbcat-py.test.txt", "pos")
labels = list(map(lambda annotation: annotation["label"], annotations))
# delete tha transcript from the corpus
corpus.deleteTranscript("labbcat-py.test.txt")
For batch uploading and other example code, see the examples subdirectory.
To build, test, release, and document the module, the following prerequisites are required:
pip3 install twine
pip3 install pathlib
pip3 install deprecated
sudo apt install python3-sphinx
python3 -m unittest
...or for specific tests:
python3 -m unittest test.TestLabbcatAdmin
cd docs
make clean
make
rm dist/*
python3 setup.py sdist bdist_wheel
twine check dist/*
twine upload dist/*