-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QueryStringQuery doesn't properly account for analysers splitting up strings #248
Comments
So, looking at this issue again with fresh eyes it seems to me like it's functioning as desired. Our whole search works by applying the same analyzer to search terms and documents. The "en" analyzer turns "mother-in-law" into "mother" and "law". It's not clear to me how we would treat this as some sort of obvious exception. I will try it with Elasticearch and see what it does. |
I'd agree that it's a little obscure (although my users have definitely been really confused by this in the past). There are two levels of string-splitting going on: the first is where the query parser breaks the input using whitespace. The second is in the Analyser, which could potentially break a string up further, into multiple terms. I think the user intuitively understands that whitespace breaks things up. But I don't think they realise the Analyser might break things up further. I think the easy and intuitive thing to do is for QueryStringQuery to just use So the query:
Would be treated as:
thus preserving the ordering of "ill" and "gotton" (using the default 'en' analyser), but not really making any difference to "gains". |
Just for background - as mentioned in the original mailing list thread, my motivation is for matching URLs in fields. My users really expect that:
Would match I could try and train them that whitespace isn't the only place where strings are broken up and that they need to quote stuff like this, but it seems better to make the default behaviour "feel" right, if possible. (My alternate query parser already uses MatchPhraseQuery by default. I've not noticed any problems, but then I doubt it's been given the workout that QueryStringQuery has had.) |
The QueryStringQuery behaviour doesn't always give the results you'd expect.
(this is a follow-up of: https://groups.google.com/forum/#!topic/bleve/cxVfZ7VQh3o )
Observed behaviour:
Using the default "en" analyser, you'd expect an unquoted query like:
mother-in-law
to be treated a single 'thing' and to match as a phrase.
Instead, the query is treated as
mother OR law
(the "in" is discarded as a stopword).Expected behaviour:
mother-in-law
in the above example should be treated as a phrase.The "en" analyzer splits
mother-in-law
up into[mother (pos 0), law (pos 2)]
, and the query should be aMatchPhraseQuery
instead of the currentMatchQuery
.The text was updated successfully, but these errors were encountered: