Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A fundamental flaw in length_limit=1 #699

Open
ampli opened this issue Mar 13, 2018 · 5 comments
Open

A fundamental flaw in length_limit=1 #699

ampli opened this issue Mar 13, 2018 · 5 comments

Comments

@ampli
Copy link
Member

ampli commented Mar 13, 2018

I tried to add ID* to the length_limit of 1 (after of course allowing this usage of ID).
I did this is order to check whether it can speed up the parsing (by maybe saving connector comparisons because the length_limit is anyway checked first).

To my surprise, the following sentences than didn't get parsed:

As yet, no one has thought of a solution.
What in God's name happened?
What in Lord's name is going on here?

The problem has to do with alternatives.
In the case of "As yet", here are the 2D-array slots:

0 1 2 3 ...
LEFT-WALL As yet
as
A.u s.u

and since the distance between as and yet is 2, the idiom link cannot apply.

In the case of the other 2 sentences, the 's is getting separated, which creates the same problem.

In addition to this problem, there is also another, as demonstrated in this sentences (not from the LG corpus- constructed for this post):
He explained why this is an "as is" clause.
Here " creates a distance 2 for the PHc link (defined with length_limit=1).
(BTW, apparently "as is" is not defined as adjective in the dict, so this sentence is unparsable even without the quotation marks.)

However, for Russian LL* links there is no such problem.
Maybe also not for YS and YP, unless we would like to support something like "ABC"'s

@linas
Copy link
Member

linas commented Mar 13, 2018

Per issue #42, the phrase "as is" should be treated as a noun. That is, any quoted phrase might actually be a noun, and the internal grammatical structure of the phrase should be ignored. That is, the quotation marks form a wall, preventing links from crossing over them.

@ampli
Copy link
Member Author

ampli commented Mar 13, 2018

From issue #42:
I said:

Tokenize it as now (separating the quotes) in case the word is used in a grammatical context, and add UNKNOWN-WORD alternative for it (including the quotes).
(I'm for (2), because I think that (1) disregards possible info in "word" that may still be interesting.)

And you agreed:

Option 2.

In option 2, we use both possibilities (2 alternatives): "as is" as an UNKNOWN-WORD, and also like now - separating the quotes (but not deleting them).

In any case when reading it loudly, isn't there an "an" even though it is quoted?

as a noun

Why only as a noun? I proposed UNKNOWN-WORD because it can be e.g. a verb (or another POS).

@linas
Copy link
Member

linas commented Mar 13, 2018

Sorry, yes, either might work. I'd have to think of examples where the quoted text would be generic unknown word instead of just nouns. But yes; sorry for confusion. All is well :-)

@ampli
Copy link
Member Author

ampli commented Mar 13, 2018

BTW, it is possible to overcome the difficulty that I pointed out, by making a more complex check when applying length_limit:

  • First skip optional words (i.e. don't count them toward the length-limit). Need to do that in 3 places:
    -- expression prune.
    -- prune.
    -- fast-matcher.
  • Then in sane-morphism (optional words disappear at that time if they were indeed unneeded, and appear if they were required for the linkage) enforce the length-limit literally.

However, this will add some overhead (but maybe not much if a skip table is prepared in advance).

linas added a commit that referenced this issue Apr 27, 2018
Add the "quoted word" idea, per issue #756 and #699
@linas
Copy link
Member

linas commented Apr 27, 2018

Pull req #764 adds the QUOTED-WORD idea from issue #756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants