Description
For the casual searcher it is not particularly helpful to have elasticsearch return an error if they are unlucky enough to match a big document.
The user gets a 400 error with this sort of message:
The length of [xxx] field ... has exceeded [1000000] - maximum allowed to be analyzed
for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset]
index level setting. For large texts, indexing with offsets or term vectors is recommended!
At this point the only workarounds the user has are:
a) User rewrites query with a NOT clause to exclude IDs of rogue docs (not ideal)
b) User reindexes content with offsets (a pain)
c) User reindexes content and truncates long strings e.g. with an ingest processor (not ideal)
d) User increases the max_analyzed_offset setting (not ideal)
None of these are great options so the proposal is that highlighters could be prevented from throwing an error and instead use a cheaper "fallback" approach to highlighting e.g. returning the first N characters of a large string field. The open questions are:
- Do we need additional properties in the highlight request to define the fallback approach?
- How do we warn the user that a fallback approach was applied for a particular result?
- Will some users want the old behaviour of errors rather than fallbacks?