-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
In PolyglotDB, there is no easy way to define a new unit (besides syllables and utterances) so that a track for that unit can be easily extracted.
I will list some cases where being able to define subunits in the corpus would be useful.
- instances of any X_Y (within-word or across-words) (e.g., VCV tokens)
(in a later post-processing step, these VCV tokens can be classified, for instance, into VCV vs. V#CV by finding rows that have onset_start == word_start)
Alternatively, one might already have these subunits labeled in the annotation files.
- importing directly from annotation files containing constituents larger than words (just like VOT annotations are contained within phone-level intervals, annotation files for some corpora may contain prelabeled prosodic constituent intervals)
- related to the bullet points above, importing from annotation files that have both hierarchical: utterances - words - phones, and non-hierarchical intervals (except any subunit must be contained within utterances by definition): utterances - words - VCV (words and VCV are not hierarchical)
There seems to be a parser function related to importing ToBI labeled data, although I haven't tried it myself.
- I'm assuming I can enrich a syllable with a point-annotated tone, and then it would be useful to define a constituent between two syllables bearing boundary tones (from the syllable immediately after a boundary-tone-bearing syllable, all the way to another boundary-tone-bearing syllable).
(This may be more similar to defining utterances based on pauses, but using "boundary-tone-bearing" syllables as delimiters (except that a boundary-tone-bearing syllable is the last syllable of this newly defined unit, whereas a pause is not included in the utterance)) - More generally, it might be useful to be able to define new units by saying all syllables between two encoded syllables (two final syllables, or two initial syllables) belong to a new unit X.
Metadata
Metadata
Assignees
Labels
No labels