Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question particles #458

Open
heacu opened this issue May 23, 2017 · 14 comments
Open

Question particles #458

heacu opened this issue May 23, 2017 · 14 comments
Labels
dependencies standard needed universal UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@heacu
Copy link

heacu commented May 23, 2017

It seems that no documentation on question particles reached v2, contrary to the final comment closing #178. I'd like to reopen the question of annotating question particles. See also #454 on the documentation of questions.

I am working on Tibetan, which has dedicated particles that occur in both content and Y/N questions. In our existing scheme we tag them as cv.ques which means question converb. In UDv2 we can tag them as PART but with what feature? PronType=Int seems wrong since these are particles and not pronouns.

One possibility at least for polar questions would be to add a new value under the Polarity feature. For example, Polarity=Xor for "exclusive disjunction" which presents a pair of alternatives only one of which is correct. This same approach might not work for content question particles, which may differ - or may be the same as - the y/n particle in a given language.

What are others here doing?

@heacu
Copy link
Author

heacu commented May 23, 2017

Incidentally, WALS has a chapter on polar questions: http://wals.info/chapter/116

@jnivre
Copy link
Contributor

jnivre commented May 23, 2017

Thanks for reopening this discussion. I agree we need to do something, perhaps not only about questions but about sentence mood in general.

@amir-zeldes
Copy link
Contributor

I agree with @jnivre, I also think it's more of a sentence property. For example, English marks polar questions with auxiliary inversion, so you wouldn't want to put the polarity annotation on any particular word in that case. I've been using a sentence annotation for sentence mood (roughly following the SPAAC scheme) in an English corpus here for that reason (using Stanford Dependencies, not UD, butt the principle is the same):

https://github.com/amir-zeldes/gum/blob/master/dep/GUM_interview_brotherhood.conll10

If you're looking at an isolated particle, and not something like a morphological feature on predicates, then maybe it can be seen as syntactically just a particle, and the resulting semantics for the whole sentence are polar.

@heacu
Copy link
Author

heacu commented May 23, 2017 via email

@jnivre
Copy link
Contributor

jnivre commented May 23, 2017

I didn't mean to imply that this was only a sentence level issue. For languages that use question particles, we definitely need appropriate features. But we should also think about how to capture sentence mood for languages that rely on other strategies such as subject-verb inversion.

@ermanh
Copy link
Member

ermanh commented May 24, 2017

I'm part of team working on Mandarin and Cantonese, but we haven't moved on to features yet, so perhaps another team(?) working on Chinese might be able to chime in. Unless I'm mistaken though, Japanese has question particles, too, and there seems to be an active Japanese team(?).

On the subject of sentence-level particles and POS/relation, we have "sentence(/utterance)-final particles" in Mandarin and Cantonese, and use PART and discourse:sp (sp for sentence particle) [discourse:sp in Mandarin and Cantonese]. In these two languages at least, they are fixed in final position unlike adverbs or auxiliaries, and their functions range from interrogative mood to epistemic modality, speech act, and pragmatic deixis. We put them under discourse under a broader interpretation of the word -- it's stretching it for sure, and not a full overlap, but we're not sure there's a better alternative at the moment.

Japanese v1 appears to use PART as well for their final particles, but they use the aux relation.

@amir-zeldes
Copy link
Contributor

@torma I definitely don't mind having more morphological features. I think for question particles in many languages this could be done mostly automatically (based on word forms and POS tags, or for Mandarin just the relevant character), but if it's done for sentences in some languages, it may make sense to standardize the existence of a sentence level annotation across languages for comparability. But this may be part of a larger conversation regarding where to annotate what in multiple languages.

@martinpopel
Copy link
Member

My two cents:

  • I guess what is needed in most cases is not a sentence-level annotation, but a clause-level annotation (obviously: stored in the head word). For example, in He asked "How are you?" and didn't wait for a reply. we cannot mark the whole sentence as interrogative.
    (Even if a sentence-level annotation is needed for some phenomena, we would need a better CoNLL-U specification for storing arbitrary/predefined key-value pairs in comments. This is a technical issue, so I don't want to go into the details here.)
  • The FEATS column was originally intended for a) "inflectional" features present in the morphology of a given word form (possibly disambiguated), b) "lexical" features inherent in a given lemma (I guess this is the case of the question particles discussed above). There is a third possibility c) grammatical features which are not present in the morphology of a given word, but can be inferred from the context, for example the periphrastic tense of complex verb forms, or the interrogative "mood" of verbs in polar questions in languages which mark it only with word order. I am not against this c, but I think it fits a deeper layer of language description and it should be discussed first whether to allow such features in current UD.

@jnivre
Copy link
Contributor

jnivre commented May 25, 2017

Yes, we considered adding a notion of "syntactic" or "phrase-level" features for v2, but in the end we decided against it because (a) it would have added more complexity to CoNLL-U and (b) we didn't have enough convincing use cases for it. We may have to revisit this if we get more convincing use cases. The other option would be to use subtyping on dependency relations, like "root:decl", "root:interrog", ..., "ccomp:decl", "ccomp:interrog", but I am not sure we want to go down this route.

@amir-zeldes
Copy link
Contributor

@martinpopel that argument makes sense, I agree. This may be off topic, but while we're on the issue of subtyping root/clause labels, this might be another argument for allowing it:

I very often have utterances that are just a vocative NP (e.g. an utterance such as "Mike!"). In these cases UD validation tells me the root label has to be root, but I always feel uneasy that this isn't vocative. Should there be root:vocative? I'm happy to open another issue if this sounds relevant.

@nschneid
Copy link
Contributor

nschneid commented May 25, 2017

Another situation where I was tempted to subtype root: sentences truncated due to length limits of the medium, often indicated with an ellipsis: "I was asking him whether...". This situation could justify root:incomplete or similar, under the logic that no sentence type has been established because there is not a complete sentence.

It would also be worth considering what to do with fragmentary utterances in dialogue that are perfectly natural linguistically. ("Where were you?" "The store." / "At the store.")

@dan-zeman
Copy link
Member

Hi all,

I think this is actually about ellipsis in UD and whether we want to annotate it in other cases than the gapping currently supported by the guidelines.

I also think this thread has moved far from the original topic of Question particles so if you wish to discuss it further, please create a new issue and copy the relevant points there.

@amir-zeldes
Copy link
Contributor

OK, thanks, I've opened an issue for the vocative question here: #459

If there's interest in discussing sentence types, we can open another issue or talk about it here (it seems to have come up regarding question particles already in #178)

@dan-zeman dan-zeman added standard needed UPOS Universal part-of-speech tags: definitions and examples dependencies universal labels Apr 24, 2018
@dan-zeman dan-zeman added this to the v2.2 milestone Apr 24, 2018
@dan-zeman dan-zeman modified the milestones: v2.2, v2.4 Nov 13, 2018
@dan-zeman dan-zeman modified the milestones: v2.4, v2.5 Oct 6, 2019
@dan-zeman dan-zeman modified the milestones: v2.5, v2.6 Nov 9, 2019
@dan-zeman dan-zeman modified the milestones: v2.6, v2.7 May 14, 2020
@dan-zeman dan-zeman modified the milestones: v2.7, v2.8 Nov 14, 2020
@dan-zeman dan-zeman modified the milestones: v2.8, v2.9 Jun 17, 2021
@dan-zeman dan-zeman modified the milestones: v2.9, v2.11 Jun 13, 2022
@dan-zeman dan-zeman modified the milestones: v2.11, v2.13 May 31, 2023
@dan-zeman dan-zeman modified the milestones: v2.13, v2.14 Nov 15, 2023
@dan-zeman dan-zeman modified the milestones: v2.14, v2.15 May 15, 2024
@dan-zeman dan-zeman modified the milestones: v2.15, v2.16 Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies standard needed universal UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

7 participants