-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
Many years ago (2016, IIRC) the code to fetch individual source fields (according to the _source argument in a search request) was changed to always use Lucene's built-in automaton matching logic to pick which source fields to return. This possibly makes sense if there are dotted paths to object subfields or if there are wildcard patterns.
I don't think it makes sense when there's just a list of field names that someone wants to retrieve. In that case, we should probably just stick them all in a HashSet and evaluate a contains() predicate to decide which fields to include in a response.
In particular, if there are a large number of fields (and those fields have long names), we end up generating a big union between linear automata. The resulting graph can have many states and many transitions, so Lucene ends up throwing a TooComplexToDeterminizeException.
Related component
Search:Performance
To Reproduce
- Create an index with a lot of fields (a few thousand), with long field names.
- Run a search request that fetches a lot of those fields (a few thousand) in the
_sourceparameter. - Get a
TooComplexToDeterminizeException
Expected behavior
We shouldn't get an exception in the simple case.
(I think I'm okay with getting an exception when there are a lot of object subfields being requested or a bunch of wildcard patterns.)
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
- OS: [e.g. iOS]
- Version [e.g. 22]
Additional context
Add any other context about the problem here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status