Various search improvement suggestions #210

joepio · 2021-11-12T11:01:09Z

I've just implemented Full-Text Search #40 and it works pretty well! Good enough for now. However, I noticed some things could be improved upon:

Besides indexing only triples, consider indexing full resources. That way, a user could comine terms present in various fields. For example, Say I'd look for a red shirt. This shirt would have two relvant properties, its type (shirt) and its color (red). As it currently only indexes triples, it would find one triple for redand one forshirt`, but it would not find something that contains both. If we'd index a full resource, we'd fix this. Consider the new JSON fields for Tantivy full-text search #336 might be a solution. You can add a json_field to a Schema.
Boost titles
Fuzzy searching does not, at the moment, score items at all. In other words, we get kind of 'random' hits for fuzzy matches, which is what we use for all short strings. That's bad. I think there's people working on this though, see PR: Use Levenshtein distance to score documents in fuzzy term queries quickwit-oss/tantivy#998. But in another comment, the PR creator told we could think of this PR of as discarded.
Search inside collections or in some hierarchy Search inside collections & hierarchies #226
tokenize the search sentence into separate parts (a new query for each token). Inspiration permalink), (thanks @ChillFish8!)
There is no scoring system to make important resources rank higher (think pagerank from google). No user feedback to make the system learn from what is relevant to me. No synonyms.
Consider indexing connected resources, too. Say in the previous example, the red was not a literal string, but it was a resource somewhere else, possibly with a very obscure Subject URL. This would mean that we would not even hit the red shirt if we searched for red! We could fix this by indexing connected resources, and including these in the initial item. Perhaps we'd add a new field: connected, and serialize all values of all directly connected nodes in here. I think doing this for a depth of 1 is doable, although it would make indexing about 10x slower, and the size of the index, too. But it would open up some cool possibilities, such as searching for a user name + class type (e.g. joep document) and see all documents of that user - without having any form of explicit filters. That's pretty cool, right?

The text was updated successfully, but these errors were encountered:

joepio mentioned this issue Nov 13, 2021

Full-text search #40

Closed

joepio added help wanted Extra attention is needed server atomic-server labels Nov 13, 2021

joepio mentioned this issue Mar 23, 2022

Consider the new JSON fields for Tantivy full-text search #336

Closed

joepio mentioned this issue Feb 21, 2023

Improve search indexing #595

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various search improvement suggestions #210

Various search improvement suggestions #210

joepio commented Nov 12, 2021 •

edited

Loading

Various search improvement suggestions #210

Various search improvement suggestions #210

Comments

joepio commented Nov 12, 2021 • edited Loading

joepio commented Nov 12, 2021 •

edited

Loading