- 
                Notifications
    You must be signed in to change notification settings 
- Fork 622
Add elision token filter docs #7981 #8026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
65c699d
              51dbf0e
              c11e36f
              57e76c9
              11a2713
              4d854bf
              2952138
              5364980
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,124 @@ | ||
| --- | ||
| layout: default | ||
| title: Elision | ||
| parent: Token filters | ||
| nav_order: 130 | ||
| --- | ||
|  | ||
| # Elision token filter | ||
|  | ||
| The `elision` token filter is used to remove elided characters from words in certain languages. Elision typically occurs in languages such as French, in which words are often contracted and combined with the following word, typically by omitting a vowel and replacing it with an apostrophe. | ||
|  | ||
| The `elision` token filter is already preconfigured in the following [language analyzers]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/): `catalan`, `french`, `irish`, and `italian`. | ||
| {: .note} | ||
|  | ||
| ## Parameters | ||
|  | ||
| The custom `elision` token filter can be configured with the following parameters. | ||
|  | ||
| Parameter | Required/Optional | Data type | Description | ||
| :--- | :--- | :--- | :--- | ||
| `articles` | Required if `articles_path` is not configured | Array of strings | Defines which articles or short words should be removed when they appear as part of an elision. | ||
| `articles_path` | Required if `articles` is not configured | String | Specifies the path to a custom list of articles that should be removed during the analysis process. | ||
| `articles_case` | Optional | Boolean | Specifies whether the filter is case sensitive when matching elisions. Default is `false`. | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can the filter itself be case sensitive? Or do we mean something like "Specifies whether the filter considers/applies case sensitivity when matching elisions"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically, yes, the filter applies case sensitivity, but I think it's understood as is 😄 | ||
|  | ||
| ## Example | ||
|  | ||
| The default set of French elisions is `l'`, `m'`, `t'`, `qu'`, `n'`, `s'`, `j'`, `d'`, `c'`, `jusqu'`, `quoiqu'`, `lorsqu'`, and `puisqu'`. You can update this by configuring the `french_elision` token filter. The following example request creates a new index named `french_texts` and configures an analyzer with the `french_elision` filter: | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. with "a"  | ||
|  | ||
| ```json | ||
| PUT /french_texts | ||
| { | ||
| "settings": { | ||
| "analysis": { | ||
| "filter": { | ||
| "french_elision": { | ||
| "type": "elision", | ||
| "articles": [ "l", "t", "m", "d", "n", "s", "j" ] | ||
| } | ||
| }, | ||
| "analyzer": { | ||
| "french_analyzer": { | ||
| "type": "custom", | ||
| "tokenizer": "standard", | ||
| "filter": ["lowercase", "french_elision"] | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "mappings": { | ||
| "properties": { | ||
| "text": { | ||
| "type": "text", | ||
| "analyzer": "french_analyzer" | ||
| } | ||
| } | ||
| } | ||
| } | ||
|  | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|  | ||
| ## Generated tokens | ||
|  | ||
| Use the following request to examine the tokens generated using the analyzer: | ||
|  | ||
| ```json | ||
| POST /french_texts/_analyze | ||
| { | ||
| "analyzer": "french_analyzer", | ||
| "text": "L'étudiant aime l'école et le travail." | ||
| } | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|  | ||
| The response contains the generated tokens: | ||
|  | ||
| ```json | ||
| { | ||
| "tokens": [ | ||
| { | ||
| "token": "étudiant", | ||
| "start_offset": 0, | ||
| "end_offset": 10, | ||
| "type": "<ALPHANUM>", | ||
| "position": 0 | ||
| }, | ||
| { | ||
| "token": "aime", | ||
| "start_offset": 11, | ||
| "end_offset": 15, | ||
| "type": "<ALPHANUM>", | ||
| "position": 1 | ||
| }, | ||
| { | ||
| "token": "école", | ||
| "start_offset": 16, | ||
| "end_offset": 23, | ||
| "type": "<ALPHANUM>", | ||
| "position": 2 | ||
| }, | ||
| { | ||
| "token": "et", | ||
| "start_offset": 24, | ||
| "end_offset": 26, | ||
| "type": "<ALPHANUM>", | ||
| "position": 3 | ||
| }, | ||
| { | ||
| "token": "le", | ||
| "start_offset": 27, | ||
| "end_offset": 29, | ||
| "type": "<ALPHANUM>", | ||
| "position": 4 | ||
| }, | ||
| { | ||
| "token": "travail", | ||
| "start_offset": 30, | ||
| "end_offset": 37, | ||
| "type": "<ALPHANUM>", | ||
| "position": 5 | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.