Use of mwe in Scandinavian languages #262

jnivre · 2016-02-16T14:21:01Z

The use of the "mwe" relation differs a lot between the Scandinavian treebanks. Just frequency says a lot:

UD_Swedish has 16.6 "mwe" relations per 1000 words.
UD_Danish has 4.9 "mwe" relations per 1000 words.
UD_Norwegian has 0 "mwe" relations per 1000 words.

The zero frequency in Norwegian is certainly by design, but the difference between Danish and Swedish is too large to be compatible with a consistent treatment of (fixed) MWEs. I suspect that they are overused in Swedish (and perhaps underused in Danish).

If we can come up with a core set of expressions that should be treated as "mwe" across the three languages, the same principles can be used also for other languages.

jnivre · 2016-04-11T16:23:21Z

Any hope to make progress on this before v1.3?

liljao · 2016-04-11T17:31:17Z

I think that is not realistic for the Norwegian data unfortunately seeing that it will require a large manual effort.

jnivre · 2016-04-11T19:40:16Z

I agree. We should probably change the milestone then.

hectormartinez · 2016-04-13T13:03:00Z

For Danish we only inserted mwe relations for words that:
a) had been underscored_together in the original Copenhagen Dependency
Treebank.
b) worked as function words

The only analysis we performed was to identify the form, lemma and UPOS of
the formants of the mwe so we could split them into syntactic-word tokens.

Here is a Dropbox link to the file we created for the conversion, maybe it
is a useful reference:

https://www.dropbox.com/s/e2pv7gr7i8mrc3j/danishCDT_2_UD_mwe-info.tsv?dl=0

2016-04-11 21:40 GMT+02:00 Joakim Nivre notifications@github.com:

I agree. We should probably change the milestone then.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#262 (comment)

jnivre · 2016-04-13T13:08:04Z

@hectormartinez: Thanks! This is very useful. At some point, we should definitely try to harmonise this across the three languages (to begin with), but there is no way we can do this for 1.3. I will change the milestone until 1.4.

jnivre added standard needed dependencies universal Germanic labels Feb 16, 2016

jnivre self-assigned this Feb 16, 2016

jnivre added this to the lg-specific v1.3 milestone Feb 16, 2016

jnivre modified the milestones: later, lg-specific v1.3 Apr 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of mwe in Scandinavian languages #262

Use of mwe in Scandinavian languages #262

jnivre commented Feb 16, 2016

jnivre commented Apr 11, 2016

liljao commented Apr 11, 2016

jnivre commented Apr 11, 2016

hectormartinez commented Apr 13, 2016

jnivre commented Apr 13, 2016

Use of mwe in Scandinavian languages #262

Use of mwe in Scandinavian languages #262

Comments

jnivre commented Feb 16, 2016

jnivre commented Apr 11, 2016

liljao commented Apr 11, 2016

jnivre commented Apr 11, 2016

hectormartinez commented Apr 13, 2016

jnivre commented Apr 13, 2016