-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Description
In LuceneTextIndexCreator we are now hardcoding the stop words for Lucene text index.
Arrays.asList("a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no",
"not", "of", "on", "or", "such", "that", "the", "their", "then", "than", "there", "these", "they", "this",
"to", "was", "will", "with", "those"),
These words will get pruned out during the text index generation as well as filter (in StandardAnalyzer). The problem with this is in production we found users issuing queries like
SELECT ... FROM ignoreMe WHERE TEXT_MATCH(title, '"IT staff" OR "IT manager"')
as will actually give the result matching TEXT_MATCH(title, '"staff" OR "manager"'). This can be easily reproduced in TextSearchQueriesTest.
cc @Jackie-Jiang @walterddr @siddharthteotia @SabrinaZhaozyf
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels