Skip to content

Add limits for ngram and shingle settings #25887

Closed
@colings86

Description

@colings86

Currently the options for ngram and shingle tokenizers/token filters allow the user to set min_size and max_size to any values. This is dangerous as users can set values which produces huge numbers of terms and at best bloat their index but at worst cause problems such as #25841.

I think we should add soft (and/or maybe hard) limits so that neither min_size or max_size can be more than say 6 and the difference between min_size and max_size can't be more than 2 or 3 (we may even want to make this limit 1).

Note that this does not apply to edge_ngrams where it is useful to have higher values and a larger difference between min and max values. We should probably decide if there should be different limits here though.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions