You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've just implemented Full-Text Search #40 and it works pretty well! Good enough for now. However, I noticed some things could be improved upon:
Besides indexing only triples, consider indexing full resources. That way, a user could comine terms present in various fields. For example, Say I'd look for a red shirt. This shirt would have two relvant properties, its type (shirt) and its color (red). As it currently only indexes triples, it would find one triple for redand one forshirt`, but it would not find something that contains both. If we'd index a full resource, we'd fix this. Consider the new JSON fields for Tantivy full-text search #336 might be a solution. You can add a json_field to a Schema.
There is no scoring system to make important resources rank higher (think pagerank from google). No user feedback to make the system learn from what is relevant to me. No synonyms.
Consider indexing connected resources, too. Say in the previous example, the red was not a literal string, but it was a resource somewhere else, possibly with a very obscure Subject URL. This would mean that we would not even hit the red shirt if we searched for red! We could fix this by indexing connected resources, and including these in the initial item. Perhaps we'd add a new field: connected, and serialize all values of all directly connected nodes in here. I think doing this for a depth of 1 is doable, although it would make indexing about 10x slower, and the size of the index, too. But it would open up some cool possibilities, such as searching for a user name + class type (e.g. joep document) and see all documents of that user - without having any form of explicit filters. That's pretty cool, right?
The text was updated successfully, but these errors were encountered:
I've just implemented Full-Text Search #40 and it works pretty well! Good enough for now. However, I noticed some things could be improved upon:
red shirt
. This shirt would have two relvant properties, its type (shirt
) and its color (red). As it currently only indexes triples, it would find one triple for
redand one for
shirt`, but it would not find something that contains both. If we'd index a full resource, we'd fix this. Consider the new JSON fields for Tantivy full-text search #336 might be a solution. You can add a json_field to a Schema.discarded
.red
was not a literal string, but it was a resource somewhere else, possibly with a very obscure Subject URL. This would mean that we would not even hit the red shirt if we searched forred
! We could fix this by indexing connected resources, and including these in the initial item. Perhaps we'd add a new field:connected
, and serialize all values of all directly connected nodes in here. I think doing this for a depth of 1 is doable, although it would make indexing about 10x slower, and the size of the index, too. But it would open up some cool possibilities, such as searching for a user name + class type (e.g.joep document
) and see all documents of that user - without having any form of explicit filters. That's pretty cool, right?The text was updated successfully, but these errors were encountered: