-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verbnouns #620
Comments
I am not sure whether the validation script requires cop to be AUX, which would be too strict, because copulas are not (auxiliary) verbs in all languages, or whether it just objects to seeing NOUN in this position. It seems that this is a case where we should allow language-specific exceptions (because the test is useful for finding annotation errors in other languages). @dan-zeman Do you have an opinon on this? |
In the Irish treebank, is is @tlynn747 any thoughts? |
Back to Welsh: if copula is in its verbnoun form, it can have possessives which would be the subject if bod was inflected. Since aux should not have children, the following is flagged by the validation script, eventhough I think it is the best way to annotate:
If we attach fy and i to the predicate (adjectives) we lose the relation between this circumfix possessive with the verbnoun, fy also triggers the mutation on bod which becomes mod.
|
On the other hand, if you don't attach "fy" to the predicate, you lose the subject relation, which is worse given UD's emphasis on core arguments. I think this is parallel to languages like Turkish, where complementation also involves nominalisation, and I think "fy" should simply be annotated as "nsubj", not as "nmod". |
I agree. But there is one thing which makes me hesitate (maybe this is not a strong argument). From a purely syntactic point of few, verbnouns are nouns, so fy mod in the example above is "possessive + noun", as is fy aval (my apple). The only difference is the xpos (verbnoun vs. noun) In the latter case I attached |
The fact that you get a different attachment for the possessive is due to the special treatment of copula constructions in UD. For ordinary verbs, "fy" would remain attached to the verbnoun, but I still think it should have the label nsubj (not nmod:subj) to indicate that it is a clause-level argument. If it helps, try to think of "fy" as attaching to the combination of the verbnoun and the predicate, rather than the predicate alone. |
Indeed, for ordinary verbnouns (not inflected verbs) it is attached to the verbnoun (as object, since the possessive on verbnouns corresponds to direct objects of inflected verbs). I misunderstood the guidelines in that |
In Turkic languages, as @jnivre said, this would be similar, the verb is a nominal form (takes accusative case to indicate that it is a complement and it takes the embedded subject in the genitive/possessive).
We would annotate this as:
A secondary question is why you have the complement as |
it's an error ... it should be a |
Sorry to arrive late into this conversation!
I can't really advise on how to label the Welsh data as I don't speak
Welsh, and haven't fully followed the explanation given in order to get a
clear insight. I can only explain what we've done for Irish!
As Fran points out, the copula 'is' is tagged as AUX (and will be attached
to the predicate with the cop label)
# sent_id = 7
# text = Cailín is ea í (She is a girl)
1 Cailín Cailín PROPN Noun Case=NomAcc|Gender=Masc|Number=Sing 0 root _ _
2 is is AUX Cop PronType=Rel|Tense=Pres|VerbForm=Cop 1 cop _ _
3 ea ea PRON Pers Number=Sing|Person=3 4 nmod _ _
4 í í PRON Pers Gender=Fem|Number=Sing|Person=3 1 nsubj _ SpaceAfter=No
5 . . PUNCT . _ 1 punct _ _
and
the substantive form 'bí' - along with its inflected forms - are tagged as
VERB, and are usually the root of the clause/ sentence.
# sent_id = 901
# text = Bhí an lá an-te (the day was very hot)
1 Bhí bí VERB PastInd Form=Len|Mood=Ind|Tense=Past 0 root _ _
2 an an DET Art Definite=Def|Number=Sing|PronType=Art 3 det _ _
3 lá lá NOUN Noun Case=NomAcc|Definite=Def|Gender=Masc|Number=Sing 1 nsubj _
_
4 an-te te ADJ Adj Degree=Pos 1 xcomp:pred _
They both translate into English as what is recognised as a copula but from
a syntactic perspective in Irish, it's important that both are treated
differently. The copula is more of a linking element, the substantive verb
functions as a just that - a verb. Word order is extremely different for
both.
With respect to verbal nouns in Irish, there is *no verbal noun form of the
copula *(such as the Welsh example of "my being") so we haven't encountered
this issue of a NOUN being attached with the 'cop' label. Verbal nouns are
POS-tagged as NOUN, along with morph features VerbForm=Vnoun. They
typically function as infinitives (preceded by infinitive particle 'a') or
gerund forms in progressive aspectual phrases, preceded by 'ag'. While the
substantive verb "to be" has a verbal noun form, it is not used in a gerund
manner, however.
e.g. *ith *eat
*a ithe* - to eat
*ag ithe - *eating
*bí - *substantive form of verb "to be"
*a bheith - *to be
**ag bheith* - this is not a valid construction, in Irish the sentence
would be rephrased from "being" to "to be" to make it syntactically
plausible.
E.g. being happy is important to him -> it is important to him *to be happy*
Teresa
Ar Máirt 9 Aib 2019 ag 17:37, scríobh Francis Tyers <
notifications@github.com>:
… In the Irish treebank, *is* is AUX, while *bí* is VERB. There appear to
be no examples of *bí* used in its citation form.
@tlynn747 <https://github.com/tlynn747> any thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#620 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGyMg6-w9KsmcJ2m5uMvSwalBqVRQcPsks5vfMHOgaJpZM4ciNXE>
.
--
*Slán agus Beannacht*
|
@dan-zeman, so would it be possible to have |
So what is the criterion to rule that a dependent of a deverbal noun is |
@jheinecke : Why don't you tag bod |
Because syntactically bod is a noun and functions as such (.i.e. it can have possessives, genitives, ...) |
But if you want to attach it via a |
(BTW using the |
I agree, ... |
Agreed. Note that the AUX tag in UD does not imply that the word is a verb, only that it is used to in support of a predicate and carries features associated with verbal inflection (in this case only person, if I understand correctly). |
@dan-zeman The criteria for distinguishing clausal from nominal structures are, as in many other cases, language-specific. In the case of Turkish, the main argument is that the nominalisation is grammaticalised and obligatory in some constructions (notably complement clauses). In other languages, we have a cline such as in the standard English examples:
Everyone agrees (I think) that we have a clause in 1 and a noun phrase in 6. But what about 4, where the "subject" takes the form of a possessive, while the "object" is still treated as a direct object of the verb. Bill Croft (drawing on typological studies) has proposed a criterion, which unfortunately only works in one direction. If the predicate is marked for case, then treat it as a noun (and its dependents as nmod). However, this does not tell us what to do when the predicate is NOT marked for case. A rule of thumb is to say that lexical derivations (like "destruction") should be treated as nouns, while forms that are clearly recognised as verbal (like the infinitive "destroy") should be treated as verbs. However, this still leaves a gray zone of gerunds, participles, etc. In addition, we need to consider the larger grammatical systems, like the obligatory nominalisation in complement clauses in Turkish. Exactly where Welsh verbnouns fit into this picture is difficult for me to say, but these are the general considerations, I think. |
In Welsh these kind ob subordinates are always nominal (with at least the "direct object" in a genitive construction, there is rarely a conjugated verbform, simultaneity, anteriority or posteriority with respexct to the main phrase, is epxressed by TAM-markers. Closer to 1 would be (possibly not totally grammatical)
Closer to 6 is the (more natural)
In any way the nominal structure of verbnouns is the main feature of Welsh. But in case of the verbnoun as copula, the tag |
@jnivre +1 and great examples. @jheinecke I agree, for now this seems like the best option. |
Sorry for commenting only now. Wouldn't it be possible to introduce a double POS (where order matters), in the sense of an "external" vs. "internal" part of speech? For example, the case you mention of destroying: we might assign it the double POS NOUN/VERB, meaning that:
So your sentence no. 4 could be analyzed as:
Don't we have a "bidirectional" behaviour that can be expressed like this in these cases? |
Agreed that some words sometimes behave like one category in relation to their parents and like another category in relation to their children, but no, it is not possible to express this within the UD framework. Here it is always assumed that each word in context belongs to exactly one of the 17 coarse-grained categories. If there is a secondary candidate category, it can be sometimes expressed using the features (like the |
Does the framework consequently also not allow e.g. a NOUN token with the Unfortunately, this brings forth lots of very weird analyses, like case-governing prepositions attached as marks to participial forms in an attributive function... Sorry if I insist, but is this really something that cannot be introduced in UD, a future milestone? It seems to be a very debated topic indeed. |
This remains somewhat controversial. Since UD makes such a strict distinction between annotation of clauses vs. nominals, I believe that if you want verbal relations, you should tag the word On the other hand, the current guidelines are less exact about this, leaving some room for diverging interpretations (which is unfortunate). |
The inflection can change the distribution of a lexeme towards its governor. This has already been described by many authors (rank theory of Jespersen 1924, transfer of Tesnière 1959). This is exactly what happens in these examples. In 1, the transfer is done by the complementizer that, in 2 by the infinitive inflexion and in 3 and 4 by the -ing inflexion. In 6, it is a derivation and the lexeme must be classified as a NOUN. While in 2, 3 and 4, the lexeme is still a VERB but the inflected form is transferred and behaves as a noun towards its governor. (This includes the determiner, which is likely to be the surface-syntactic governor of the noun, as it is argued by many linguists, at least since Hudson 1984.) |
Yes, verbnouns behave like nouns, inflected forms behave like verbs. Some dictionaries (like the Geiriadur Pryfysgol Cymru, http://www.geiriadur.ac.uk/, the lexical authority for Welsh), list verbs under the first person singular, present/future tense instead the verbnoun. But I would not go as far to say verbnouns are a derivation (as destruction wrt destroy). "her house"
"to see her"
but "I (will) see her"
If we change the UPOS of verbnouns to |
An inflection morpheme can change the realization of the arguments of the lexeme it combines with. For instance, a passive morpheme will change an initial object into a subject. I think I would have a similar treatement here and consider that the verbnoun form changes the initial object into an |
I have incurred again in the problematic annotation of verbal nouns, this time more with regard to the dependency relation of the verbal item, and would like to bring forth the following example from the Latin corpora:
tradentis is the genitive singular (m/f/n, all three are formally identical) of the present participle of trado 'to betray', so something like "betrayer/betraying". The fact that it still works as a verb is given by the presence of the core argument me 'me', the first person singular pronoun in the accusative case. If it were completely nominalized, I think it would rather look like mei tradentis, with a possessive determiner meus 'my/mine' concording in case and number. So, given
Indeed, in Latin the attributive construction with the participle would require concordance with the noun, here:
So manus tradens me and manus tradentis me are really structurally different. I was considering the analysis as the ellipsis of a nominal, namely (as a siede note, notice that this nominal is re-integrated in many translations in modern European languages) something like homo 'man', or also a determinant:
And everything is perfectly fine here. Under this light, then, I might conceive to use the orphan relation:
It is to be noted that such constructions with a participle in a markedly nominal environment and taking adnominal modifiers seem to appear quite late in literature, from what I was able to get, so perhaps we can not rule out that we are in presence of a reanalysis, and then the relabeling of such forms as Finally, just a comment about a similar case: I find similar problems with some usages of tradens manum suam mihi tetendit as 'the betrayer stretched his/her/its hand to me', then, if we were to keep the verbal form, this isn't really a "subject that is itself a clause". To summarize, I see this issues/proposals/questions I have come up with:
To me, it seems that the relation roster of UD really needs some readjustments to treat verbal nouns in all their occurrences. What are your thoughts about it? |
I think it sounds like a nominalized participle, which acts like a noun meaning "betrayer", is in genitive (since it's possessed), and take an accuative |
manus tradentis me is equivalent to manus eius qui me tradit, and therefore the participle stands for a relative clause + its antecedent: if there were no accusative, the participle could be easily analyzed as a substantivized participle (therefore nmod). In this case, the presence of the dependent accusative reveals the ambivalent nature of the participle in all its strength (it is both adjective/substantivized adjective and verb). As a consequence, unless one introduces an elliptical node, it is clear that any solution would not do justice to it, in a way or in another (the current UD annotation system is binary in this respect: you can choose either a noun label or a verb one). This phenomenon is not rare in Latin and Ancient Greek. All in all, I would vote for acl. |
@gcelano , I would like to ask you, since you are clearly in a better position than me in that regard, if you have noticed different trends in the treatment of such verbal nouns in similar constructions in Classical Latin with respect to later, Patristic and Medieval Latin. This might help in assessing the best annotational strategy. I wonder if such difficulties simply come from trying to analyse two different varieties based only on the earlier one, when in fact new phenomena might have arisen. For example, in the corpora I have found a sentence by Palladio (XVI c.) who clearly uses currens as a modern Italian would, i.e. as a full-fledged noun, and as maybe an Ancient Roman would never had. By the way: maybe the use of |
What was originally a participle can at a certain point become an adjective or a noun: in this case, treat it as an adjective or a noun, respectively (so the morphological annotation would not be participle anymore). Of course, there may be borderline cases: in that case, choose an interpretation and, ceteris paribus, always stick to it. |
If you tag tradentis as a If you tag tradentis as a If you interpret the example as an ellipsis (manus [hominis] tradentis me), then you should use the default UD approach to ellipsis, which is promotion of one of the orphaned dependents. Here, hominis has only one dependent, tradentis, which will be promoted, i.e., it will be attached to manus as Out of these three options, I would probably lean towards the first or the second, but I'm not strictly against the third one either. Which one should be used is the question of the broader situation in Latin (as well as the diachronic consideration you mentioned), and I'm afraid I cannot advise on that. |
Thanks, Dan. This looks like an excellent survey of the options. I agree that objects of adjectives may be borderline acceptable, especially for participles in languages where these are generally tagged ADJ, but I think a VERB+obj or NOUN+nmod analysis would be preferable if possible. |
It’s always a matter of lemmatization. If you lemmatize tradentis under trado (VERB), the only option in UD is acl. At least in the IT-TB, apart from some exceptions, participles are lemmatized under the corresponding verb, following a quite strictly (inflectional) morphological criterion.
In this specific case, I do not think that there are enough good reason in favor of not lemmatizing tradentis under trado, also considering that, as correctly pointed out, here the participle behaves ‘verbally’, as demonstrated by the direct object in the accusative case (me).
I understand that it is not a perfect solution (but, anyway, a good one), but it is the best (and clearest) we have so far in UD.
Best,
Marco
… Il giorno 10 lug 2020, alle ore 10:15, Joakim Nivre ***@***.***> ha scritto:
Thanks, Dan. This looks like an excellent survey of the options. I agree that objects of adjectives may be borderline acceptable, especially for participles in languages where these are generally tagged ADJ, but I think a VERB+obj or NOUN+nmod analysis would be preferable if possible.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
In Welsh, the base forms of verbs are verbnouns, which functions as nouns. There are no infinitives. Whereas infinitives (like English or German) mark direct objects in the same way for finite and infinite verbforms, verbnouns do not, cf. German
and Welsh
If the auxiliary bod (be) is used in its verbnoun form, we get a noun under a cop-deprel:
bod yn goch "being red" (yn is a predication particle, needed if an adjective or noun is the predicate)
As the issue 3, this is rejected by the validation script, and indeed, it is odd to have a noun as copula. But how could it be annotated in a better way? I could not find similar cases in the Irish treebank (Irish has verbnouns as Welsh). Breton is typologically different in having real infinitives.
The text was updated successfully, but these errors were encountered: