Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auxiliaries in Japanese and Chinese #986

Open
rafael75012 opened this issue Nov 2, 2023 · 7 comments
Open

Auxiliaries in Japanese and Chinese #986

rafael75012 opened this issue Nov 2, 2023 · 7 comments
Labels
Chinese dependencies Japanese UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@rafael75012
Copy link

rafael75012 commented Nov 2, 2023

The definition of AUX in UD is "An auxiliary is a function word that accompanies the lexical verb of a verb phrase and expresses grammatical distinctions not carried by the lexical verb, such as person, number, tense, mood, aspect, voice or evidentiality."

However, in Japanese and Chinese guidelines, suffixes expressing TAM are tagged as tagged AUX (impliying that they occupy independent nodes from the main verb).

Examples include the past suffix -た ta, or the negative suffix ない nai in Japanese, and the perfective 了 le, durative 着 zhe, experiential 过 guo in Chinese.

Is it normal? Or is it an instance where a category from Indo-European languages (here "auxiliary") is 'forcefully' applied to non-Indo-European languages? (And in that case, wouldn't have it be possible to simply not segment these suffixes as independent nodes, but as belonging to the main verb stem instead?)

@mehmetoguzderin
Copy link
Member

Word segmentation and its definition have been recurrent topics in UD. There are UD Workshop papers that specifically mention this for language pairs like Korean and Japanese. I believe it's best to allow language-specific guidelines to coexist, especially if treebanks compatible with each other's guidelines can identify as such, where downstream tasks can specify chains across languages as appropriate sources for their studies.

TLDR: I think specific guidelines are justifiable and don't superimpose or force Indo-European categories. Labeling something as "normal" seems a bit strong, especially considering the nuanced situation for both linguistic and computational analysis of these languages.

Lack of a "Normal" Across and Within Languages: It's vital to note that what's considered "normal" in language categories can shift significantly even within the same language or its family. Hence, flexibility is crucial to accommodate these standpoints in a broader, universal framework. Oftentimes, simplifying generalizations criticizing differentiating factors has very little substance, and I don't think it will ever benefit UD to favor one over another on more contentious topics.

Morphosyntactic Boundary Flexibility: Tagging TAM (Tense-Aspect-Mood) suffixes as AUX in languages like Japanese and Chinese showcases the flexibility UD offers in denoting morphosyntactic boundaries. This adaptability is pivotal for generating syntactic trees that are more universally comparable. This comparability makes even straightforward parameters like head directionality highly effective for inference and reasoning. Languages with postpositional inflections or derivations sometimes blur the boundaries between words and affixes, unlike languages with defined boundaries. Addressing these suffixes as standalone nodes in syntax trees offers a clearer and more consistent grammatical representation across varying language typologies.

Convenience: Increasing split minimizes the need to alternate between too many frameworks to address primitive "subword" components (see projects matching RegEx Uni.*) to increase comparability, enhancing the ease of accessing fine-grained structures yet comparable across languages.

Multiword Tokens: Some treebanks within these language families may have distinct segmentation, particularly when aligning with boundaries commonly employed in K-12 education, where the splits exemplified in OP can break. Utilizing multiword tokens can be a strategy to represent these coarser units.

In summary, I feel those guidelines' classification of TAM suffixes in Japanese and Chinese as AUX does not "force" Indo-European grammar categories. Instead, it thoughtfully adjusts to the linguistic characteristics of these languages, reflecting UD's capability to embrace varied linguistic forms while ensuring computational simplicity.

Note: I'm not an authoritative member, and these views are only my personal opinions. Although I've previously contributed to the UD Workshop introducing a treebank, that data in itself is under significant revision on my end, with current data being only placeholders that are also different from the paper a bit.

@rafael75012
Copy link
Author

rafael75012 commented Nov 3, 2023

@mehmetoguzderin Why wouldn't these TAM markers simply be analyzed as how they are commonly considered in Japanese and Chinese grammars, that is, suffixes, and accordingly be segmented as such?

To take two examples:

@mehmetoguzderin
Copy link
Member

@rafael75012 The analogy drawn is a bit superficial. I want to highlight a phrase I wrote above: "... simplifying generalizations criticizing differentiating factors..."

Some treatises or learner's materials can frame these grammatical units as "suffixes" for convenience. In substance, it is essential to remember that these are, first of all, "adpositional" or "affixing" units. Why they come to be suffixes has little to do with direct comparison to French word segmentation in question: French is head-initial, and Japanese is head-final. What you find in Japanese after the free word, by grammatical correspondence, you'd look before the word in French. To exemplify from case markers in this case, you can take the simple case of possessives with "parent"の"child" and "child" de "parent" instantiations. The surface-level similarity of the position of these conjugative elements' has fundamental differences, especially considering the historical progression of the individual languages.

Are these commonly "just" suffixes?

  • For Japanese studies done in English, they go by different names: particles (usually more strictly for nominal particles), bound auxiliary, dependent word, auxiliary verb, etc.
  • For reading Japanese grammar in the Japanese language, verbal markers and nominal markers have the same classification as adjectives (notice 詞): 助動詞 "jodoushi" for verbal markers, 格助詞 "kakujoushi" for nominal markers and 形容詞 "keiyoushi" for adjectives.

Note: I'm not familiar with grammar approaches to Chinese; I only can speak colloquially. This language pair is a bit difficult to illustrate as their governing grammar has significant differences, starting from word order.

@rafael75012
Copy link
Author

rafael75012 commented Nov 3, 2023

@mehmetoguzderin

Why they come to be suffixes has little to do with direct comparison to French word segmentation in question

It was not about comparing French and Japanese/Chinese, but about comparing the UD annotation of French and Japanese/Chinese.

Why they come to be suffixes has little to do with direct comparison to French word segmentation in question: French is head-initial, and Japanese is head-final.

The French compound past expressing TAM is in part rendered by an auxiliary verb placed in front of the main verb. But it is an auxiliary verb, not a prefix.

For Japanese studies done in English, they go by different names: particles (usually more strictly for nominal particles), bound auxiliary, dependent word, auxiliary verb, etc.

Thank you for these information.
Nonetheless, I don't think it proves these conjugation endings are not suffixes.

For reading Japanese grammar in the Japanese language, verbal markers and nominal markers have the same classification as adjectives (notice 詞): 助動詞 "jodoushi" for verbal markers, 格助詞 "kakujoushi" for nominal markers and 形容詞 "keiyoushi" for adjectives.

詞 means "word", but also "Part-of-Speech", and seems to be a character used in very various metalinguistic words, therefore having a very loose meaning. https://en.wikipedia.org/wiki/Japanese_particles

@dan-zeman dan-zeman added UPOS Universal part-of-speech tags: definitions and examples dependencies Japanese Chinese labels Nov 3, 2023
@dan-zeman dan-zeman added this to the v2.14 milestone Nov 3, 2023
@amir-zeldes
Copy link
Contributor

@rafael75012 I think this would be up to the UD Japanese and Chinese maintainers, but there are significant differences between French tense inflection and た and 了. For starters in terms of word order, it would be unusual for a morphological marker to be external to semantic operations like negation, but in Japanese negation appears between the verb and the past tense marker. In some analyses, we might say that negated past is a marker on the negator item, and doesn't even belong to the verb at all. It's also conspicuous that the same past tense marker appears to accompany Japanese predicative adjectives, which are verb-like in some ways, but are still a distinct part of speech. Making た into an AUX helps explain this duality and says 'there is really just this one auxiliary', instead of two identical parallel paradigms. In any case, these are just some of many arguments why た is not a morphological suffix of the verb itself (similar questions apply to use of honorifics and benefactive verbs, which carry this tense marker, rather than the lexical verb).

For Chinese 了 notice that it can follow the object in many Chinese compound verbs and even productive V-N pairs. All sorts of material can appear before 了. With Chinese rarely showing something like articles and having no productive morphological reduction for VERB+obj, it's maybe possible to argue this is some kind of incorporation, and 了 is an affix like you are suggesting, which combines with the whole V+N 'lexeme', but AFAIK this analysis is not popular for Chinese, and it seems much more straightforward to say that 了 can appear after objects simply because it is a separate 'word'. This also helps the analysis in clauses with two 了, which again would be unusual for inflection - what is being inflected by the second 了 in 你 睡 了 一 天 了 "you've been sleeping the whole day"? It's easier to think of it as a separate word meaning something like "already", but so grammaticalized that it just means 'aspect' (and in fact it's easy to combine it with a true adverb meaning 'already').

None of these would work for French -ais: it cannot be disconnected from its verb, not by a negator and not by an object noun. Even though historically some French synthetic tense markers originate from periphrastic constructions, like j'aimerai from the infinitive plus an auxiliary, synchronically there is no way to separate them, so it seems reasonable to treat them as single inflected word forms at this point. But as @mehmetoguzderin pointed out above, this is not uncontroversial for example in UD Japanese, and ultimately the linguistic tradition of each language has some influence on how borderline cases are analyzed.

@rafael75012
Copy link
Author

rafael75012 commented Nov 4, 2023

@amir-zeldes On the one hand I think the arguments your answer brings are very convincing to analyze these markers as AUX, on the other hand, I still have this feeling that they are analyzed as AUX under the influence of the existence of this AUX tag in UD; and this impression comes from the fact that Japanese is generally seen as an agglutinative language, that is, a language with words made of a lot of (clearly delimited) suffixes (including these tense markers), and that in Chinese linguistics, 了 and other TAM markers are considered suffixes

@rafael75012
Copy link
Author

rafael75012 commented Nov 6, 2023

The definition of AUX in the Japanese guideline states: "The AUX tag is used for words tagged by auxiliary_verb / 助動詞 in UniDic."
https://universaldependencies.org/ja/pos/AUX_.html

Maybe these conjugation endings (たい, た, etc.) are considered 助動詞 jodōshi (lit. "auxiliary verbs") in Japanese linguistics.
https://ja.wikipedia.org/wiki/%E5%8A%A9%E5%8B%95%E8%A9%9E_(%E5%9B%BD%E6%96%87%E6%B3%95)

This same metalinguistic term of 助動詞 seems to be used to describe auxiliary verbs in English (can, will, etc.). It is the first thing that pops up when we search it on Google:
https://www.rarejob.com/englishlab/column/20171215/

In my electronic Japanese dictionary, one meaning of 助動詞 is "inflecting dependent word (in Japanese), bound auxiliary", the other is "auxiliary verb (in language other than Japanese)".

I know that in Chinese linguistics, the metalinguistic term 助词 zhùcí is used to refer to French auxiliary verbs (words that are placed in front of the main verb: il a mangé) and aspect suffixes (perfective 了 le, durative 着 zhe, experiential 过 guo).

For instance the famous Chinese dictionary 现代汉语八百词 defines these aspects markers as 动态助词 dòngtài zhùcí, (this is translated in English by 'aspect particle').

@dan-zeman dan-zeman modified the milestones: v2.14, v2.15 May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chinese dependencies Japanese UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

4 participants