Skip to content

Highlighters shouldn't error on big documents #52155

Closed

Description

For the casual searcher it is not particularly helpful to have elasticsearch return an error if they are unlucky enough to match a big document.
The user gets a 400 error with this sort of message:

The length of [xxx] field ... has exceeded [1000000] - maximum allowed to be analyzed 
for highlighting. This maximum can be set by changing the [index.highlight.max_analyzed_offset] 
index level setting. For large texts, indexing with offsets or term vectors is recommended!

At this point the only workarounds the user has are:

a) User rewrites query with a NOT clause to exclude IDs of rogue docs (not ideal)
b) User reindexes content with offsets (a pain)
c) User reindexes content and truncates long strings e.g. with an ingest processor (not ideal)
d) User increases the max_analyzed_offset setting (not ideal)

None of these are great options so the proposal is that highlighters could be prevented from throwing an error and instead use a cheaper "fallback" approach to highlighting e.g. returning the first N characters of a large string field. The open questions are:

  1. Do we need additional properties in the highlight request to define the fallback approach?
  2. How do we warn the user that a fallback approach was applied for a particular result?
  3. Will some users want the old behaviour of errors rather than fallbacks?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions