Skip to content

Commit

Permalink
Merge pull request #1050 from cikay/kmr
Browse files Browse the repository at this point in the history
Explain multi-part verbs
  • Loading branch information
dan-zeman authored Aug 30, 2024
2 parents 9872bb6 + 71d33a0 commit c29dbb1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _kmr/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ udver: '2'
* According to typographical rules, many punctuation marks are attached to a neighboring word. We tokenize them as separate tokens (words), except the following cases:
* The period marking an abbreviation: _Dr._ “doctor” is one token.
* The apostrophe (or occasionally a hyphen) is not treated as punctuation when it occurs between a number and its morphological suffix, as in _15'ê_, _1932'an_.
* There is a small class of words that may contain spaces in writing.
* There is a type of verb called 'Lêkerên hevedudanî' which is similar to English phrasal verbs. These verbs typically consist of two or three parts that are separated by spaces when written. However, in passive voice and causative forms, these parts are written adjacent.
* There are several closed classes of contractions that are treated as multi-word tokens and segmented to individual syntactic words.
The most prominent type is a pronoun fused with the future auxiliary: _ezê = ez + dê_ “I will”.

Expand Down

0 comments on commit c29dbb1

Please sign in to comment.