-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Description
As part of the larger effort to update and improve Elasticsearch docs, the Analysis section is in need of a revamp. Relevant issues: the topics vary in depth and completeness; some have dated examples or examples that are not consistent; chunking may need to be removed or added; and the organization is arbitrary and does not always show the relationship between one topic and another.
To make these changes, docs covered by this issue will incorporate a revised, standardized structure. For example, in token filters, I'll add examples, configuration parameters, and customization options, and replace circular definitions such as "NGram Token Filter: A token filter of type ngram" with a complete definition and explanation about when the user would expect to employ that filter.
Proposed structure:
- Title (Level 2): Definition and explanation of topic
- Example (Level 3): Vanilla example and output
- Configure parameters (Level 3): Parameters available with descriptions
- Customize (Level 3): How to customize
- Example (Level 4): Customize example and output
PRs will be revised as I work through the topics. Some PRs may include more than one topic when changes are small and comparable across topics.
Top Level Docs
- analysis (nav +intro) [DOCS] Rewrite analysis intro #51184
- overview [DOCS] Add overview page to analysis topic #50515
- concepts [DOCS] Add concepts section to analysis topic #50801
- Index time vs search time analysis [DOCS] Rewrite analysis intro #51184
- Stemming [DOCS] Add stemming concept docs #55156
- Token graphs [DOCS] Add token graph concept docs #53339
- Configure text analysis [DOCS] Add tutorials section to analysis topic #50809
- Specify an analyzer [DOCS] Rewrite analysis intro #51184
Sections
Analyzers #58362
- fingerprint-analyzer
- keyword-analyzer
- lang-analyzer
- pattern-analyzer
- simple-analyzer [DOCS] Fix headings for simple analyzer docs #58910
- standard-analyzer
- stop-analyzer
- whitespace-analyzer
Character Filters
- htmlstrip-charfilter [DOCS] Reformat
html_strip
char filter #57764 - mapping-charfilter [DOCS] Reformat
mapping
charfilter #57818 - pattern-replace-charfilter
Token Filters
- apostrophe-tokenfilter [DOCS] Reformat apostrophe token filter docs #48076
- asciifolding-tokenfilter [DOCS] Reformat ASCII folding token filter docs #48143
- cjk-bigram-tokenfilter [DOCS] Reformat CJK bigram and CJK width token filter docs #48210
- cjk-width-tokenfilter [DOCS] Reformat CJK bigram and CJK width token filter docs #48210
- classic-tokenfilter [DOCS] Reformat classic token filter docs #48314
- common-grams-tokenfilter [DOCS] Reformat common grams token filter #48426
- compound-word-tokenfilter [DOCS] Reformat compound word token filters #49006
- condition-tokenfilter [DOCS] Reformat condition token filter #48775
- decimal-digit-tokenfilter [DOCS] Reformat decimal digit token filter #48722
- delimited-payload-tokenfilter [DOCS] Reformat delimited payload token filter docs #49380
- edgengram-tokenfilter [DOCS] Reformat n-gram token filter docs #49438
- elision-tokenfilter [DOCS] Reformat elision token filter docs #49262
- fingerprint-tokenfilter [DOCS] Reformat fingerprint token filter docs #49311
- flatten-graph-tokenfilter [DOCS] Reformat
flatten_graph
token filter #54268 - hunspell-tokenfilter [DOCS] Reformat
hunspell
token filter #56955 - keep-types-tokenfilter [DOCS] Reformat keep types and keep words token filter docs #49604
- keep-words-tokenfilter [DOCS] Reformat keep types and keep words token filter docs #49604
- keyword-marker-tokenfilter [DOCS] Reformat
keyword_marker
token filter #54076 - keyword-repeat-tokenfilter [DOCS] Reformat
keyword_repeat
token filter #54428 - kstem-tokenfilter [DOCS] Reformat
kstem
token filter #55823 - length-tokenfilter [DOCS] Reformat length token filter docs #49805
- limit-token-count-tokenfilter [DOCS] Reformat token count limit filter docs #49835
- lowercase-tokenfilter [DOCS] Reformat lowercase token filter docs #49935
- minhash-tokenfilter [DOCS] Reformat
min_hash
token filter docs #57181 - multiplexer-tokenfilter [DOCS] Reformat
multiplexer
token filter #57555 - ngram-tokenfilter [DOCS] Reformat n-gram token filter docs #49438
- normalization-tokenfilter
- pattern-capture-tokenfilter [DOCS] Reformat
pattern_capture
token filter #57664 - pattern_replace-tokenfilter [DOCS] Reformat
pattern_replace
token filter #57699 - phonetic-tokenfilter
- porterstem-tokenfilter [DOCS] Reformat
porter_stem
token filter #56053 - predicate-tokenfilter [DOCS] Reformat
predicate_token_filter
tokenfilter #57705 - remove-duplicates-tokenfilter [DOCS] Reformat
remove_duplicates
token filter #53608 - reverse-tokenfilter [DOCS] Reformat reverse token filter docs #50672
- shingle-tokenfilter [DOCS] Reformat
shingle
token filter #57040 - snowball-tokenfilter [DOCS] Reformat
snowball
token filter #56394 - stemmer-override-tokenfilter [DOCS] Reformat
stemmer_override
token filter #56840 - stemmer-tokenfilter [DOCS] Reformat
stemmer
token filter #55693 - stop-tokenfilter [DOCS] Reformat
stop
token filter #53059 - synonym-graph-tokenfilter [DOCS] Reformat
synonym_graph
token filter #53901 - synonym-tokenfilter
- trim-tokenfilter [DOCS] Reformat trim token filter docs #51649
- truncate-tokenfilter [DOCS] Reformat truncate token filter docs #50687
- unique-tokenfilter [DOCS] Reformat unique token filter docs #50748
- uppercase-tokenfilter [DOCS] Reformat uppercase token filter docs #50555
- word-delimiter-graph-tokenfilter [DOCS] Reformat
word_delimiter_graph
token filter #53170 - word-delimiter-tokenfilter [DOCS] Reformat
word_delimiter
token filter #53387
Tokenizers #58361
- chargroup-tokenizer
- classic-tokenizer
- edgengram-tokenizer
- keyword-tokenizer
- letter-tokenizer
- lowercase-tokenizer
- ngram-tokenizer
- pathhierarchy-tokenizer-examples
- pathhierarchy-tokenizer
- pattern-tokenizer
- simplepattern-tokenizer
- simplepatternsplit-tokenizer
- standard-tokenizer
- thai-tokenizer
- uaxurlemail-tokenizer
- whitespace-tokenizer