Skip to content

[BUG] Fetching source uses automata even for simple matching #17114

@msfroh

Description

@msfroh

Describe the bug

Many years ago (2016, IIRC) the code to fetch individual source fields (according to the _source argument in a search request) was changed to always use Lucene's built-in automaton matching logic to pick which source fields to return. This possibly makes sense if there are dotted paths to object subfields or if there are wildcard patterns.

I don't think it makes sense when there's just a list of field names that someone wants to retrieve. In that case, we should probably just stick them all in a HashSet and evaluate a contains() predicate to decide which fields to include in a response.

In particular, if there are a large number of fields (and those fields have long names), we end up generating a big union between linear automata. The resulting graph can have many states and many transitions, so Lucene ends up throwing a TooComplexToDeterminizeException.

Related component

Search:Performance

To Reproduce

  1. Create an index with a lot of fields (a few thousand), with long field names.
  2. Run a search request that fetches a lot of those fields (a few thousand) in the _source parameter.
  3. Get a TooComplexToDeterminizeException

Expected behavior

We shouldn't get an exception in the simple case.

(I think I'm okay with getting an exception when there are a lot of object subfields being requested or a bunch of wildcard patterns.)

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions