Comparable Wikipedia Corpus (aligned documents)
Corpus extracts from 20-01-2017 Wikipedia dumps
This corpus is aligned by WikiDocsAligner
- Arabic-Egyptian
In the future, other language pairs will be included
Arabic Wikipedia | Egyptian Wikipedia | |
---|---|---|
documents | 10,197 | 10,197 |
words | 8,397,154 | 1,543,516 |
vocabulary | 740,055 | 215,659 |
Motaz Saad and Basem Alijla (2017). WikiDocsAligner: an off-the-shelf Wikipedia Documents Alignment Tool. in The Second Palestinian International Conference on Information and Communication Technology (PICICT 2017).