|  | 
|  | 1 | +--- | 
|  | 2 | +layout: default | 
|  | 3 | +title: Fingerprint | 
|  | 4 | +parent: Token filters | 
|  | 5 | +nav_order: 140 | 
|  | 6 | +--- | 
|  | 7 | + | 
|  | 8 | +# Fingerprint token filter | 
|  | 9 | + | 
|  | 10 | +The `fingerprint` token filter is used to standardize and deduplicate text. This is particularly useful when consistency in text processing is crucial. The `fingerprint` token filter achieves this by processing text using the following steps: | 
|  | 11 | + | 
|  | 12 | +1. **Lowercasing**: Converts all text to lowercase. | 
|  | 13 | +2. **Splitting**: Breaks the text into tokens. | 
|  | 14 | +3. **Sorting**: Arranges the tokens in alphabetical order. | 
|  | 15 | +4. **Removing duplicates**: Eliminates repeated tokens. | 
|  | 16 | +5. **Joining tokens**: Combines the tokens into a single string, typically joined by a space or another specified separator. | 
|  | 17 | + | 
|  | 18 | +## Parameters | 
|  | 19 | + | 
|  | 20 | +The `fingerprint` token filter can be configured with the following two parameters. | 
|  | 21 | + | 
|  | 22 | +Parameter | Required/Optional | Data type | Description | 
|  | 23 | +:--- | :--- | :--- | :---  | 
|  | 24 | +`max_output_size` | Optional | Integer | Limits the length of the generated fingerprint string. If the concatenated string exceeds the `max_output_size`, the filter will not produce any output, resulting in an empty token. Default is `255`. | 
|  | 25 | +`separator` | Optional | String | Defines the character(s) used to join the tokens into a single string after they have been sorted and deduplicated. Default is space (`" "`). | 
|  | 26 | + | 
|  | 27 | +## Example | 
|  | 28 | + | 
|  | 29 | +The following example request creates a new index named `my_index` and configures an analyzer with a `fingerprint` token filter: | 
|  | 30 | + | 
|  | 31 | +```json | 
|  | 32 | +PUT /my_index | 
|  | 33 | +{ | 
|  | 34 | +  "settings": { | 
|  | 35 | +    "analysis": { | 
|  | 36 | +      "filter": { | 
|  | 37 | +        "my_fingerprint": { | 
|  | 38 | +          "type": "fingerprint", | 
|  | 39 | +          "max_output_size": 200, | 
|  | 40 | +          "separator": "-" | 
|  | 41 | +        } | 
|  | 42 | +      }, | 
|  | 43 | +      "analyzer": { | 
|  | 44 | +        "my_analyzer": { | 
|  | 45 | +          "type": "custom", | 
|  | 46 | +          "tokenizer": "standard", | 
|  | 47 | +          "filter": [ | 
|  | 48 | +            "lowercase", | 
|  | 49 | +            "my_fingerprint" | 
|  | 50 | +          ] | 
|  | 51 | +        } | 
|  | 52 | +      } | 
|  | 53 | +    } | 
|  | 54 | +  } | 
|  | 55 | +} | 
|  | 56 | +``` | 
|  | 57 | +{% include copy-curl.html %} | 
|  | 58 | + | 
|  | 59 | +## Generated tokens | 
|  | 60 | + | 
|  | 61 | +Use the following request to examine the tokens generated using the analyzer: | 
|  | 62 | + | 
|  | 63 | +```json | 
|  | 64 | +POST /my_index/_analyze | 
|  | 65 | +{ | 
|  | 66 | +  "analyzer": "my_analyzer", | 
|  | 67 | +  "text": "OpenSearch is a powerful search engine that scales easily" | 
|  | 68 | +} | 
|  | 69 | +``` | 
|  | 70 | +{% include copy-curl.html %} | 
|  | 71 | + | 
|  | 72 | +The response contains the generated tokens: | 
|  | 73 | + | 
|  | 74 | +```json | 
|  | 75 | +{ | 
|  | 76 | +  "tokens": [ | 
|  | 77 | +    { | 
|  | 78 | +      "token": "a-easily-engine-is-opensearch-powerful-scales-search-that", | 
|  | 79 | +      "start_offset": 0, | 
|  | 80 | +      "end_offset": 57, | 
|  | 81 | +      "type": "fingerprint", | 
|  | 82 | +      "position": 0 | 
|  | 83 | +    } | 
|  | 84 | +  ] | 
|  | 85 | +} | 
|  | 86 | +``` | 
0 commit comments