Skip to content

Latest commit

 

History

History
8 lines (6 loc) · 330 Bytes

tokenization.md

File metadata and controls

8 lines (6 loc) · 330 Bytes
layout title
base
Tokenization

Tokenization

The tokenization of the UD Basque treebank follows the tokenization of the Basque Dependency Treebank (BDT), which is a straightforward whitespace-based tokenization with conventional separation of punctuation. The Basque UD treebank does not contain multiword tokens.