-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A fundamental flaw in length_limit=1 #699
Comments
Per issue #42, the phrase "as is" should be treated as a noun. That is, any quoted phrase might actually be a noun, and the internal grammatical structure of the phrase should be ignored. That is, the quotation marks form a wall, preventing links from crossing over them. |
From issue #42:
And you agreed:
In option 2, we use both possibilities (2 alternatives): "as is" as an UNKNOWN-WORD, and also like now - separating the quotes (but not deleting them). In any case when reading it loudly, isn't there an "an" even though it is quoted?
Why only as a noun? I proposed UNKNOWN-WORD because it can be e.g. a verb (or another POS). |
Sorry, yes, either might work. I'd have to think of examples where the quoted text would be generic unknown word instead of just nouns. But yes; sorry for confusion. All is well :-) |
BTW, it is possible to overcome the difficulty that I pointed out, by making a more complex check when applying length_limit:
However, this will add some overhead (but maybe not much if a skip table is prepared in advance). |
I tried to add
ID*
to the length_limit of 1 (after of course allowing this usage of ID).I did this is order to check whether it can speed up the parsing (by maybe saving connector comparisons because the length_limit is anyway checked first).
To my surprise, the following sentences than didn't get parsed:
The problem has to do with alternatives.
In the case of "As yet", here are the 2D-array slots:
and since the distance between
as
andyet
is 2, the idiom link cannot apply.In the case of the other 2 sentences, the
's
is getting separated, which creates the same problem.In addition to this problem, there is also another, as demonstrated in this sentences (not from the LG corpus- constructed for this post):
He explained why this is an "as is" clause.
Here
"
creates a distance 2 for the PHc link (defined with length_limit=1).(BTW, apparently "as is" is not defined as adjective in the dict, so this sentence is unparsable even without the quotation marks.)
However, for Russian
LL*
links there is no such problem.Maybe also not for
YS
andYP
, unless we would like to support something like"ABC"'s
The text was updated successfully, but these errors were encountered: