-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Construct Phrases may result in double quotes when original search is quoted #41
Comments
Thanks for raising the issue. I agree it's a problem and will look into applying the patch. If you'd like the process to go faster, though, please submit a formal PR and also a unit test showing that your fix works. The unit tests are all done in Python and should be fairly easy to understand; there are instructions in the readme. |
Thanks for the fast response and for all the hard work you've done with this project. I assume PR refers to Pull Request (I've only ever used github for pulling code, rather than contributing). I'm also a novice when it comes to SOLR internals and configuration functionality (Filters/Tokenizers), though looking through yours and Tiens Multi-term synonym logic this past week has given me a bit of a crash course. One quick question: How do the various Tokenizers and Filters interact when there are query time Tokenizer and Filter specified on the field being queried when using the synonym_edismax parser? Is there a specific order in which they are applied or does one super-cede the other orare they completely independent of the other? Our schema file specifies the following in our schema file for the field we're expanding synonyms on:
|
@rpialum No problem, and yes PR is a pull request. :) I'm not sure how to answer your question, but if you use the debug UI in the solr admin, you should be able to see how the filter factories and tokenizers are being successively applied to the input. Also, as for your schema, you can add it to the sample schema, which is also what's used in the unit test. So that way, you should be able to get the unit tests running. I.e. these files are what's used in the unit tests. Ping me if anything else is unclear; hope that helps! |
Issue healthonnet#41, when original search term is quoted and synonym being expanded is item quoted don't re-quote when doing constructPhraseQueries. Removes issue where double quotes appear in result. Also when doing constructPhraseQueries, only quote phrases not single term synonyms.
Stubbed out test case for issue healthonnet#41. Additional logic needs to be implemented for testing the results, see TODO. I've been unable to get Python(v3.4.1) to run the top level file, but am assuming that debugQuery needs to be set on and the contents of the 'expandedSynonyms' in the response needs to be analyized. Currently I'm assuming a count of the number of quotes would work, though the exact expected string could also be passed in.
Stubbed out test case for issue healthonnet#41. Additional logic needs to be implemented for testing the results, see TODO. I've been unable to get Python(v3.4.1) to run the top level file, but am assuming that debugQuery needs to be set on and the contents of the 'expandedSynonyms' in the response needs to be analyized. Currently I'm assuming a count of the number of quotes would work, though the exact expected string could also be passed in.
Issue #41, when original search term is quoted and synonym being expanded is item quoted don't re-quote when doing constructPhraseQueries. Removes issue where double quotes appear in result. Also when doing constructPhraseQueries, only quote phrases not single term synonyms.
fixed in 242d330 |
There are a few issues that exist with the current constructPhrases logic when expanding synonyms. One of which can result in multiple quotes being applied when the original search is quoted.
ie: Search: "Internal Revenue Service" takes money
Synonyms: (IRS, tax service, internal revenue service)
Current results: "IRS" takes money; ""tax service"" takes money; ""internal revenue service"" takes money.
My proposed solution involves making a few fixes with in generateSynonymQueries(), all when SynonymDismaxParams.SYNONYMS_CONSTRUCT_PHRASES has been set to true.
Changes:
Add to top of generateSynonymQueries():
String origQuery = getQueryStringFromParser();
int queryLen = origQuery.length();
// TODO: make the token stream reusable?
TokenStream tokenStream = synonymAnalyzer.tokenStream(SynonymDismaxConst.IMPOSSIBLE_FIELD_NAME,
new StringReader(origQuery));
Replace current phraseQuery if logic with:
if (constructPhraseQueries && typeAttribute.type().equals("SYNONYM") &&
termToAdd.contains(" "))
{
//Dont' Quote when original is already surrounded by quotes
if( offsetAttribute.startOffset()==0 ||
offsetAttribute.endOffset() == queryLen ||
origQuery.charAt(offsetAttribute.startOffset()-1)!='"' ||
origQuery.charAt(offsetAttribute.endOffset())!='"')
{
// make a phrase out of the synonym
termToAdd = new StringBuilder(termToAdd).insert(0,'"').append('"').toString();
}
}
The text was updated successfully, but these errors were encountered: