You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The use of the "mwe" relation differs a lot between the Scandinavian treebanks. Just frequency says a lot:
UD_Swedish has 16.6 "mwe" relations per 1000 words.
UD_Danish has 4.9 "mwe" relations per 1000 words.
UD_Norwegian has 0 "mwe" relations per 1000 words.
The zero frequency in Norwegian is certainly by design, but the difference between Danish and Swedish is too large to be compatible with a consistent treatment of (fixed) MWEs. I suspect that they are overused in Swedish (and perhaps underused in Danish).
If we can come up with a core set of expressions that should be treated as "mwe" across the three languages, the same principles can be used also for other languages.
The text was updated successfully, but these errors were encountered:
For Danish we only inserted mwe relations for words that:
a) had been underscored_together in the original Copenhagen Dependency
Treebank.
b) worked as function words
The only analysis we performed was to identify the form, lemma and UPOS of
the formants of the mwe so we could split them into syntactic-word tokens.
Here is a Dropbox link to the file we created for the conversion, maybe it
is a useful reference:
@hectormartinez: Thanks! This is very useful. At some point, we should definitely try to harmonise this across the three languages (to begin with), but there is no way we can do this for 1.3. I will change the milestone until 1.4.
The use of the "mwe" relation differs a lot between the Scandinavian treebanks. Just frequency says a lot:
UD_Swedish has 16.6 "mwe" relations per 1000 words.
UD_Danish has 4.9 "mwe" relations per 1000 words.
UD_Norwegian has 0 "mwe" relations per 1000 words.
The zero frequency in Norwegian is certainly by design, but the difference between Danish and Swedish is too large to be compatible with a consistent treatment of (fixed) MWEs. I suspect that they are overused in Swedish (and perhaps underused in Danish).
If we can come up with a core set of expressions that should be treated as "mwe" across the three languages, the same principles can be used also for other languages.
The text was updated successfully, but these errors were encountered: