-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apostrophe "s" disambiguation issue with search query style sentences #1074
Comments
In my opinion, However, I think that there is a bug with It's being parsed as "john is neat documents about georgia" which is almost certainly going to be an incorrect tagging in all cases. It also would be unlikely for it to be "john has neat documents about georgia" as (from what I've come to understand as a non-linguist) the compromise/src/2-two/contraction-two/compute/isPossessive.js Lines 79 to 82 in 4ef66b3
This one ultimately comes down to switches and is related to #1070 that I opened a couple of days ago. The tagger initially thinks that "documents" is a plural noun (which is correct), but changes it to a present tense verb (which is incorrect) because the next word is about, which locks in the word An example which may be key in improving this:
Talks should be a verb. This is currently handled because of the "about" lock.
talks here should be a plural noun. We know this because John is talks about Georgia doesn't make sense, John has talks about Georgia is an improper use of the contraction, so John's must be possessive. Are those all good assumptions? and a possessive needs to be followed by a noun chunk (ie noun alone, or adjective + noun, etc) One more to make it more confusing:
Should be "John is nuts about Georgia." How does this fit in with everything else? what makes talks and nuts different? |
Yep, Ryan you've got it dead-on. Well done. I'm open to suggestions about how to improve this, as it produces pretty-bad outcomes when it's wrong. I've always wanted to keep things one-pass. Changing the contraction back, after the tagger had made various decisions, seems like a difficult solution. The good news is that many of these problem words like 'talks' are flagged, and we can add careful rules about them to is/has classifier. We could add some extra look-arounds there, to mitigate this. Happy to help plug away at this. Thank you Caleb and Ryan. |
added some tests to dev, for the I think the |
Thank y’all for looking into this! @ryancasburn-KAI you mentioned:
From poking around the source code, I can’t see an obvious way to use a plugin to prioritize the possessive tag. Is there some plugin capability I’m missing? Can I write a plugin the pre-emptively tags a |
@calebmer - you can try this const plugin = {
compute: {
custPossessives: doc => doc.match("(#Person &&/'s$/)").tag('Possessive'),
},
}
nlp.plugin(plugin)
nlp._world.hooks.splice(7, 0, 'custPossessives')
console.log(nlp("john's closed tasks").json()[0].terms) The only thing this seems to get wrong is that "closed" is still labeled as a verb, even though "John's" is a possessive. This comes down to compromise/src/2-two/preTagger/compute/tagger/3rd-pass/06-switches.js Lines 25 to 30 in 4ef66b3
This puts the sort order as:
an Adj|Past with a ProperNoun before is classified as a verb (John documented things.) I don't know if this is fixable via a plugin. @spencermountain thoughts on a smarter sorter? Maybe possessive is handled specially, since it is an add on tag (ie, can go on any noun, but if it applies, it's rules should be considered first)? compromise/src/2-two/preTagger/compute/tagger/3rd-pass/06-switches.js Lines 25 to 30 in 4ef66b3
to: let tags = Array.from(term.tags).sort((a, b) => {
let numA = tagSet[a] ? tagSet[a].parents.length : 0
let numB = tagSet[b] ? tagSet[b].parents.length : 0
if (a == 'Possessive') {
return -1
}
if (b == 'Possessive') {
return 1
}
return numA > numB ? -1 : 1
}) |
thanks Ryan, hope to have a fix for this in the next day or two. |
released as |
I’m using compromise to parse search queries. Search queries are interesting in that they’re not complete sentences. I’m having an issue with this query:
Here I want to interpret “john's” as possessive, not as “john has”. However, compromise parses this as “john has”. I’ve traced the code to here:
compromise/src/2-two/contraction-two/compute/isPossessive.js
Lines 45 to 56 in 4ef66b3
“closed” is tagged as a verb so “john’s” is interpreted as “john has”.
A similar query:
…correctly tags the phrase.
While I suspect I could add “closed” to the lexicon as a noun to fix this specific case, I need to support parsing arbitrary words between the user name and entity type.
Is there a workaround on my end I could write? Is this a bug in compromise? (I’d guess not, the phrase is truly ambiguous and you gotta pick somehow.) Could there be a configuration option for this? Is it safe to patch compromise and always
return true
from the code branch I linked?Another query that’s treating a
's
as not possessive when I want it to be possessive is:…but I’m not sure whether this is the same root cause.
The text was updated successfully, but these errors were encountered: