-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
English nominal subtypes: merge :npmod and :tmod as :unmarked #1028
Comments
As this is a trivial change to implement, but one that multiple treebanks may want to make in concert, is it better to update EWT/GUM before May 1 or wait until the next release? |
I'm not the right @ for LinES, but I can do it in the CoreNLP converter, PUD, and Pronouns @LarsAhrenberg I can do it if you want me to do it to LinES |
Is this just literally a string replace over everything? The only : relations marked in Pronouns are |
PUD has plenty. Please confirm if there's any intelligence required to do this, or just ESC-shift-5 |
Simple replacement. Since EWT lacks any entity annotation whatsoever, for the |
Not sure, time is a bit tight. And it's not just English, where I can update the GUM, Reddit and GENTLE repos - I know of at least UD Coptic and Hebrew IAHLTwiki which I maintain and use these labels, so I could change those, but I haven't coordinated with the annotators about this. Do you know if there are other datasets using these subtypes? I wouldn't want to create differences between datasets on short notice just for a renaming. |
|
OK let's not rush it then. Let's implement it in the 2.15 release. |
For Ancient Hebrew the usage of |
@mr-martian I think |
I started to draft a new issue about this, forgetting that this one existed. :D One bit of information not included above is the alternatives that were discussed, which I'll put for posterity:
|
…); add more examples; mark as deprected (#1028)
Implemented for EWT, and created some initial docs:
Still need to update more docs pages and mark old subtypes as deprecated. What are implementation plans for other treebanks? |
So far UD_English-LinES has used neither :npmod nor :tmod, but it seems quite straightforward to implement :unmarked so I put it up for version 2.15. |
I made a PR for PUD. I don't think it's relevant for Pronouns |
Reviewing the outputs of my script adding :unmarked to obl and nmod tokens I've come across a number of cases where I think the subrelation is reasonable but which are not covered in the initial docs ( oblique, nmod ). I would be grateful to hear the views of other people. Multipart references to locations by way of Northfield , Minnesota Apposition like but without identity of reference: Subject: The cost of enlargement Your amendments uphold two important principles: the right of rightholders to fair remuneration and the ... Personal pronoun + noun Go back to Stromboli, you dumb bastard Multi-word proper noun made adjective Pre-head modifier like 'a couple' Fronted or extraposed subject predicative These grew spontaneously one out of the other, Sound imitations |
"Pop" can't be omitted so it looks like
Interesting...haven't thought about this one:
Because you can say "the man is Puerto Rican", I would lean toward treating the whole expression as an ADJ (ExtPos=ADJ). Thus: flat(Puerto/PROPN,ExtPos=ADJ Rican/ADJ) and amod(man, Puerto) The rest have been discussed but not decided yet. See this paper for a synopsis and some proposals. If you want to contribute to the discussion: #455, UniversalDependencies/UD_English-EWT/issues/436, #751, #762, #933, #1024 |
* Replaces nmod/obl:npmod/tmod * Uses of tmod can be emulated using lemma list in label_trees.py (e.g. for generating PTB NP-TMP) * See UniversalDependencies/docs#1028
* Replaces nmod/obl:npmod/tmod * Uses of tmod can be emulated using lemma list in label_trees.py (e.g. for generating PTB NP-TMP) * See UniversalDependencies/docs#1028
* Replaces nmod/obl:npmod/tmod * Used TemporalNPAdjunct=Yes in misc to preserve tmod info * See UniversalDependencies/docs#1028
* Replaces nmod/obl:npmod/tmod * Used TemporalNPAdjunct=Yes in misc to preserve tmod info * See UniversalDependencies/docs#1028
* Replaces nmod/obl:npmod/tmod * Used TemporalNPAdjunct=Yes in misc to preserve tmod info * See UniversalDependencies/docs#1028
* Replaces :npmod subtype * See UniversalDependencies/docs#1028
OK, this change should now be done and documented for:
|
Excellent! Any updates regarding English-Atis (@aslikuzgun), English-ESLSpok (@kristopherkyle), English-ParTUT (@msang)? All of these use at least a subset of the |
I believe the English docs are now up to date, with mentions of I have not heard any objections to incorporating |
It depends. If I know that a treebank is actively maintained (or was in the not-so-distant past), like EWT, I would hesitate to touch it without the current maintainer's consent. If I know that the data provider / last maintainer has been silent for a long time, I would just go and fix it. Ideally the validator should flag it as a new error and the treebanks should get their four years grace period. But we currently have this mechanism only for the main guidelines, not for the language-specific relation subtypes. |
Is there a reason to keep this issue open or has everything been resolved? |
I think it's still open for Atis, ESLSpok, ParTut. |
Because prepositions are so important in English, we have a well-established practice of distinguishing ordinary prepositional
nmod
andobl
from other kinds via subtyping (nmod:poss
, etc.).In particular,
nmod:tmod
/obl:tmod
have been used for non-prepositional temporal adjunct nominals likeobl:tmod
) The party Friday was widely attended. (nmod:tmod
)in contrast to
obl
) The party on Friday was widely attended. (nmod
)tmod
is part of the legacy of Stanford Dependencies. In light of current UD theory, it is an anomaly where the subtype reflects a semantic but not syntactic distinction (#893). Moreover, it is potentially confusing that only some temporal obliques (the prepositionless ones) receive the subtype.Meanwhile,
nmod:npmod
/obl:npmod
are used for OTHER non-prepositional adjunct nominals (in special constructions like "5 dollars a share" and "Shares eased a fraction). The term "npmod" (derived from thenpadvmod
relation in Stanford Dependencies) has been a source of confusion and invokes a concept of NP that is not part of UD theory.A discussion amongst the core group concluded that a subtype named
:unmarked
would be a less confusing way to implement the adpositional vs. non-adpositional distinction, for languages that choose to do so.@amir-zeldes and I plan to implement this for our English corpora, by simply renaming both
:tmod
and:npmod
to:unmarked
. Perhaps English-Atis (@aslikuzgun), English-ESLSpok (@kristopherkyle), English-{LinES, Pronouns, PUD} (@AngledLuffa), English-ParTUT (@msang) would like to do so as well for consistency.The text was updated successfully, but these errors were encountered: