Description
I have a multi_match
query that searches across all fields. Without highlighting, it takes 2 seconds.
I'd like to use highlight to tell me which fields match. But highlight makes the query take several minutes long. I gave up after 2 minutes -- I'm not sure if the query ever finishes. I tried all three highlight types (unified, plain, fvh).
So instead of using highlight, I'm manually iterating through the source documents and finding the matching field. This only takes maybe .2 seconds.
(Curious about how Elasticsearch query works -- as part of the query process, does Elasticsearch know which field matches? Say document D contains field F that matches query Q. When Elasticsearch determines that D is a result for Q, does Elasticserach know that F contains Q, as part of that process?)
Would it be possible to create a highlight type that is fast? It would either return only the field that matched, or the field name and entire contents of the field.
Here are some timing stats. My index has 121k documents. Across all documents there are 7k fields; a particular document will have a small subset of the 7k fields.
~/data-explorer (master): curl "localhost:9200/_cat/indices?v&s=index"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open nurse_s_health_study kMyBnxPVTLC7W5ufetg3UQ 5 1 121701 0 5.6gb 5.6gb
yellow open nurse_s_health_study_fvh snoFPeGQSWqXZL8rGG-1qg 5 1 121701 0 8.9gb 8.9gb
no highlighter
GET /nurse_s_health_study/_search
{
"query": {
"multi_match": {
"query": "pre",
"type": "phrase_prefix"
}
},
"size": 10
}
2.2s
unified highlighter
GET /nurse_s_health_study/_search
{
"query": {
"multi_match": {
"query": "pre",
"type": "phrase_prefix"
}
},
"highlight": {
"fields": {
"*": {
"type": "unified"
}
}
},
"size": 10
}
Gave up after 2 mins.
plain highlighter
GET /nurse_s_health_study/_search
{
"query": {
"multi_match": {
"query": "pre",
"type": "phrase_prefix"
}
},
"highlight": {
"fields": {
"*": {
"type": "plain"
}
}
},
"size": 10
}
Gave up after 4 minutes.
fvh highligher
I reindexed with term_vector = with_positions_offsets
.
GET /nurse_s_health_study_fvh/_search
{
"query": {
"multi_match": {
"query": "pre",
"type": "phrase_prefix"
}
},
"highlight": {
"fields": {
"*": {
"type": "fvh"
}
}
},
"size": 10
}
Gave up after 2 minutes.