-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing PronType? #230
Comments
I'm willing to add them in GUM if you have a clear idea of what should get what! |
Some poking around EWT—we have:
P.S. There are handful of articles and demonstrative DETs erroneously missing PronType. |
The udapi checker expects all PRON and DET tokens to have a PronType: |
Thanks for raising this - I had a look at the corresponding cases in GUM, and I think this depedit script would take care of those cases in a reasonable way (there are a couple of GUM typos this covers, but the rules are cascaded to spare cases that already have PronTypes, so I think this should work for most of the EWT cases too):
Does this look reasonable? |
"a", not "an", for the lemma?
I don't think this is necessary because the lemma should be corrected to "a" or "the". And EWT would also need a rule for a few demonstratives with typos. |
Whoops, thanks for catching! Yes "a", and that should be morph and the word form not lemma in the other rule. I kept it in because the current GUM morphology was missing those cases, even though the lemma was correct, because it was cribbed off of the corresponding CoreNLP code, which relied on word forms. So then we have:
I can add this to the GUM build bot (it will not overwrite any manually specified morph annotations) |
Added Rcp type, resulting in:
|
…; expect PronType=Ind for all indefinite pronouns (#230)
Since we are listing DepEdit rules here:
EDIT: Updated the PronTypes |
OK, but I think if "no-one" were spelled with a hyphen, we should tokenize it apart and analyze it the same as "no one", plus I see you're doing |
Yes, PronType=Neg for "no one", "nobody", etc. In terms of the hyphenated one, in EWT there are just two tokens of "noone", which is a nonstandard spelling, so the lemma is "no-one". "No-one" might be tokenized in other corpora as 3 tokens, in which case the hyphen is irrelevant to the analysis. |
I would probably tokenize those with SpaceAfter=No, CorrectSpaceAfter=Yes, but it's not crucial |
I see the logic in that but I don't want to manually retokenize this very long sentence if I can help it :) |
OK. Actually I have a fork of the arborator gui that can do it - if you want to paste the conllu here I can easily retokenize it. |
For edeps as well?
|
Hm, no, not for edeps... Maybe worth doing at some point, we just edeps on the fly for 99% of cases so it hasn't come up :) |
Laura Michaelis (pc) mentioned that the -ever series of pro-forms (whoever, whatever, etc.) are indefinites. I think they should be given Details:
|
Currently in GUM these are Rel if dominated by a
The free relative kind could reasonably be Rel IMO. As for Ind, I guess if you have something like "eat a sandwhich or whatever", that would be Ind. I don't think they should carry dual types, if that's what you mean by Ind,Int - I think it's either or (I mean, a regular "what" can be answered by an indefinite or definite, and I wouldn't call it either, just Int) Finally for the DM however, I agree it should not have a PronType at all. |
I think the point is that "whatever", as opposed to "what", is specifically indefinite, whether it functions as interrogative or relative. |
We should implement the PRON tag and I assume with no |
From the UD overview article:
However, some of these mentioned types are not consistently bearing a
PronType
. Other indefinite and interrogative pronouns should be examined as well.The text was updated successfully, but these errors were encountered: