layout	title	udver
base	Korean UD	2

UD for Korean

Tokenization and Word Segmentation

The tokenization of the Korean UD treebanks follows the tokenization of the Korean data distributed by the SPMRL 2013 shared task, which is a straightforward whitespace-based tokenization with conventional separation of punctuation.
There are no words with spaces.
There are currently no multi-word tokens. This may change in the future, as some words have no space between them, and instead of indicating this by SpaceAfter=No in MISC, multi-word tokens may be preferable.

Morphology

Lemmas

At present, the lemma column in the GSD and Kaist treebanks violates the UD guidelines. Instead of showing a selected surface form as the citation form for the lexeme, it shows the morphemes delimited by the plus (+) character. This should be fixed in future version and a real lemma should be provided.

Features

Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.

Syntax

Core Arguments, Oblique Arguments and Adjuncts

Korean uses a nominative-accusative alignment. Direct objects are marked by the accusative morpheme 을 eul.

Relations Overview

The following relation subtypes are used in Korean:
- nsubj:pass for nominal subjects of passive verbs
- csubj:pass for clausal subjects of passive verbs
- nmod:poss for possessive (genitive) modifiers
- det:poss for possessive determiners
- acl:relcl for relative clauses
- obl:tmod for temporal adjuncts
- compound:lvc for light verb constructions
- flat:name for connection of parts of a flat multi-word named entity
The following relation types are not used in Korean at all: expl, list, parataxis, reparandum

Treebanks

There are 3 Korean UD treebanks:

Korean-GSD
Korean-Kaist
Korean-PUD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

UD for Korean

Tokenization and Word Segmentation

Morphology

Lemmas

Tags

Features

Syntax

Core Arguments, Oblique Arguments and Adjuncts

Relations Overview

Treebanks

Files

index.md

Latest commit

History

index.md

File metadata and controls

UD for Korean

Tokenization and Word Segmentation

Morphology

Lemmas

Tags

Features

Syntax

Core Arguments, Oblique Arguments and Adjuncts

Relations Overview

Treebanks