-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
English mischievous nominals involving names and numbers #1040
Comments
I have to make up my mind about the concrete questions you ask but a side note: The table from your paper mentions |
To me, Mirror Lake looks very similar to standard nominal compounds in English (mirror case, mirror hall?) and the only difference seems to be that it does not use a determiner. Lake Michigan feels different. I suspect that you only use this order if the second word cannot function as a common noun (or common anything) in English, right? Michigan is just a meaningless label from the English perspective, although I think that it actually means "big lake" in one of the Algonquinian languages. In this light, I would say it is closer to President Obama, which also is not a compound (unlike e.g. U.S. president). I am not super excited about looking at determiner licensing exactly for the reasons you cite. You would need a determiner with the non-name compound a/the mirror lake but you don't need it with Mirror Lake. On the other hand, I would not completely exclude it if we do not have any better clues. As a side remark, in Czech Lake Michigan would be jezero Michigan, where jezero "lake" is not part of the name, and jezero should be the head because it would inflect for the case required by the surrounding context (nominative as subject, accusative as object etc.) while Michigan would stay in nominative no matter what. This would be different from prezident Obama where both words would inflect. I'm not claiming that it should affect the English solution in any way. Perhaps only if there were no good criteria in English and a bunch of other languages had criteria like this, someone might say that we decide it in English the same way for the sake of parallelism. But it would be the last thing I would consider. |
As a side note to this side remark, even more common translation into Czech (according to several corpora) is Michiganské jezero. In this case, jezero is part of the name (named entity) because Michiganské is morphologically and syntactically an adjective. Both words inflect (e.g. genitive Michiganského jezera) and jezero is the head. As a side comment regarding the side note:-), it is difficult to define which words are part of the name, even within a single language where capitalization usually helps. One could think that Michiganské as an adjective needs a governing noun, thus jezero must be part of the name as well. However, in náměstí Míru (square of Peace), náměstí is not considered part of the name and thus it is not capitalized, despite it is also the head (and Míru is a genitive noun). However, the subway station of the same name has both words capitalized: stanice metra Náměstí Míru, according to the official prescriptive grammar. Most Czech speakers never learn all the capitalization rules correctly.:-) |
Looking at this list as a sample, that appears to be the main usage (the 2nd part is inherently proper), but "Lake Pleasant" and "Lake Red Rock" do occur. Of the 5 Great Lakes, "Lake Superior" has a transparent second part (the others are derived from native words—Erie, Huron, Michigan, Ontario). In terms of place names mentioning geographical features, "Mount" is also an initial word, and "Mount Pleasant" is a popular example where the 2nd part is transparent. |
Channeling @amir-zeldes (who is on vacation), the Lake X and Mount X patterns are presumably relics of older word orders in English or borrowings from French that are not productive beyond names. |
Let me see if I can articulate some principles based on the sentiments above. A straw man proposal to define
A descriptor phrase denotes a main category to which a referent belongs. The descriptor is a modifier—typically of a name or number, often forming a larger proper name with its head. The head is essential to denoting the correct referent, whereas the descriptor is omissible (at least with strong context to narrow down a set of possible referents). This distinguishes it from nominal compounds, where the main category noun is the head. The descriptor phrase is not a full noun phrase: it does not begin with a preposition or determiner (or possessive taking the place of a determiner). Given a nominal-nominal combination X Y, the following tests can be applied:
Combinations not eliminated by the above constraints are good candidates for
A consequence of the above guidelines is that each of the dependency relations at issue should have a fairly rigid directionality, resulting in the word order generalization:
(* This seems appropriate as these pertain to English constructions where word order is paramount. |
same in Portuguese and many other languages I guess. I guess we can’t even talk about rules in general but only rules for specific editors or conventions for specific contexts. So maybe it is hard to qualify as correct or incorrect. |
English has a tangled mess of minor patterns for constructing proper names. Having revised the
flat
guidelines to clarify the prototypical cases of headless vs. internally structured proper names, it is worth returning to "mischievous" cases that @amir-zeldes and I explored in this paper. Relatedly, @dan-zeman wrote a paper exploring how dates might be treated across several languages. For this thread, let's focus on constructions lacking evidence from agreement.These constructions have been discussed in disparate threads, e.g. #455 and #654. I would like to see if looking at the range of constructions can lead us to some general principles for determining headedness and choosing a deprel.
This table from the paper offers a summary:
Also: dates written like February 23, February 23rd, February the 23rd
Some starting points:
appos
requires two full nominals, which would seemingly exclude most of these construction (except my brother Sam)compound
, the first part is plural if the second part is coordinated. This seems to be a distinct type of nominal modification construction, which we callnmod:desc
("descriptor").compound
clearly works there.Concrete questions I have been struck on:
flat
, and if so, what about other noun+number names?The text was updated successfully, but these errors were encountered: