Allow all token and character filters to be used in normalizers

Elasticsearch makes a distinction between Analyzers, used for breaking up text into individual tokens and applying some normalization to them, and Normalizers, which do no segmentation of text and only apply the normalization stages, for use in keyword fields.  We have an additional restriction, which is that only character filters and token filters that are defined as `NormalizingXFactory` are permitted in the definition of normalizers, and in the past there were checks that these normalizing factories tracked lucene's MultiTermAwareComponent.

However, there is a confusion here between normalization of whole tokens, as done for a keyword field, and the character-by-character normalization done by MultiTermAwareComponent (now replaced in lucene by normalize() methods on TokenFilterFactory and CharFilterFactory).  The latter is used by custom analyzers when `Analyzer#normalize()` is called, and is specifically designed for use with partial terms such as prefixes or wildcards, where filters such as synonyms or stemmers make no sense - not as an additional restriction on what can be done to a full term in a keyword field.

To clear up this confusion, we should remove the filter restrictions in normalizers, and instead define them simply as a normal analyzer, but with either a Keyword or Whitespace tokenizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow all token and character filters to be used in normalizers #43758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow all token and character filters to be used in normalizers #43758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions