Skip to content

Latest commit

 

History

History
12 lines (8 loc) · 494 Bytes

tokenization.md

File metadata and controls

12 lines (8 loc) · 494 Bytes
layout title
base
Tokenization

Tokenization

The tokenization in the Hungarian UD treebank follows the principles of the Szeged Dependency Treebank (Vincze et al. 2010). It does not contain multiword tokens.

References

Vincze, Veronika; Szauter, Dóra; Almási, Attila; Móra, György; Alexin, Zoltán; Csirik, János 2010: Hungarian Dependency Treebank. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10), Valletta, Malta.