Skip to content

Defining a sub-unit within an utterance (other than words) #247

@seungsuklee

Description

@seungsuklee

In PolyglotDB, there is no easy way to define a new unit (besides syllables and utterances) so that a track for that unit can be easily extracted.
I will list some cases where being able to define subunits in the corpus would be useful.

  • instances of any X_Y (within-word or across-words) (e.g., VCV tokens)
    (in a later post-processing step, these VCV tokens can be classified, for instance, into VCV vs. V#CV by finding rows that have onset_start == word_start)

Alternatively, one might already have these subunits labeled in the annotation files.

  • importing directly from annotation files containing constituents larger than words (just like VOT annotations are contained within phone-level intervals, annotation files for some corpora may contain prelabeled prosodic constituent intervals)
  • related to the bullet points above, importing from annotation files that have both hierarchical: utterances - words - phones, and non-hierarchical intervals (except any subunit must be contained within utterances by definition): utterances - words - VCV (words and VCV are not hierarchical)

There seems to be a parser function related to importing ToBI labeled data, although I haven't tried it myself.

  • I'm assuming I can enrich a syllable with a point-annotated tone, and then it would be useful to define a constituent between two syllables bearing boundary tones (from the syllable immediately after a boundary-tone-bearing syllable, all the way to another boundary-tone-bearing syllable).
    (This may be more similar to defining utterances based on pauses, but using "boundary-tone-bearing" syllables as delimiters (except that a boundary-tone-bearing syllable is the last syllable of this newly defined unit, whereas a pause is not included in the utterance))
  • More generally, it might be useful to be able to define new units by saying all syllables between two encoded syllables (two final syllables, or two initial syllables) belong to a new unit X.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions