generated from amazon-archives/__template_Apache-2.0
-
Couldn't load subscription status.
- Fork 621
document the new analysis-phonenumber plugin
#8469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kolchfa-aws
merged 8 commits into
opensearch-project:main
from
rursprung:document-analysis-phone-plugin
Oct 22, 2024
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
d81da26
document the new `analysis-phonenumber` plugin
rursprung 0c422c4
Minor rewrites
kolchfa-aws a197b13
Apply suggestions from code review
kolchfa-aws 584cde4
Update _analyzers/supported-analyzers/phone-analyzers.md
kolchfa-aws cc2ed9d
Update _analyzers/supported-analyzers/phone-analyzers.md
kolchfa-aws e3a768f
Merge branch 'main' into document-analysis-phone-plugin
kolchfa-aws 8ec5bad
Apply suggestions from code review
kolchfa-aws c2b5c68
Merge branch 'main' into document-analysis-phone-plugin
kolchfa-aws File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| --- | ||
| layout: default | ||
| title: Phone number | ||
| parent: Analyzers | ||
| nav_order: 140 | ||
| --- | ||
|
|
||
| # Phone number analyzers | ||
|
|
||
| The `analysis-phonenumber` plugin provides analyzers and tokenizers for parsing phone numbers. A dedicated analyzer is required because parsing phone numbers is a non-trivial task (even though it might seem trivial at first glance). For common misconceptions regarding phone number parsing, see [Falsehoods programmers believe about phone numbers](https://github.com/google/libphonenumber/blob/master/FALSEHOODS.md). | ||
|
|
||
|
|
||
| OpenSearch supports the following phone number analyzers: | ||
|
|
||
| * [`phone`](#the-phone-analyzer): An [index analyzer]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) to use at indexing time. | ||
| * [`phone-search`](#the-phone-search-analyzer): A [search analyzer]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/) to use at search time. | ||
|
|
||
| Internally, the plugin uses the [`libphonenumber`](https://github.com/google/libphonenumber) library and follows its parsing rules. | ||
|
|
||
| The phone number analyzers are not meant to find phone numbers in larger texts. Instead, you should use them on fields that only contain phone numbers. | ||
| {: .note} | ||
|
|
||
| ## Installing the plugin | ||
|
|
||
| Before you can use the phone number analyzers, you must install the `analysis-phonenumber` plugin by running the following command: | ||
|
|
||
| ```sh | ||
| ./bin/opensearch-plugin install analysis-phonenumber | ||
| ``` | ||
|
|
||
| ## Specifying a default region | ||
|
|
||
| You can optionally specify a default region for parsing phone numbers by providing the `phone-region` parameter within the analyzer. Valid phone regions are represented by ISO 3166 country codes. For more information, see [List of ISO 3166 country codes](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes). | ||
|
Check warning on line 33 in _analyzers/supported-analyzers/phone-analyzers.md
|
||
|
|
||
| When tokenizing phone numbers containing the international calling prefix `+`, the default region is irrelevant. However, for phone numbers that use a national prefix for international numbers (for example, `001` instead of `+1` to dial Northern America from most European countries), the region needs to be provided. You can also properly index local phone numbers with no international prefix by specifying the region. | ||
|
|
||
| ## Example | ||
|
|
||
| The following request creates an index containing one field that ingests phone numbers for Switzerland (region code `CH`): | ||
|
|
||
| ```json | ||
| PUT /example-phone | ||
| { | ||
| "settings": { | ||
| "analysis": { | ||
| "analyzer": { | ||
| "phone-ch": { | ||
| "type": "phone", | ||
| "phone-region": "CH" | ||
| }, | ||
| "phone-search-ch": { | ||
| "type": "phone-search", | ||
| "phone-region": "CH" | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "mappings": { | ||
| "properties": { | ||
| "phone_number": { | ||
| "type": "text", | ||
| "analyzer": "phone-ch", | ||
| "search_analyzer": "phone-search-ch" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|
|
||
| ## The phone analyzer | ||
|
|
||
| The `phone` analyzer generates n-grams based on the given phone number. A (fictional) Swiss phone number containing an international calling prefix can be parsed with or without the Swiss-specific phone region. Thus, the following two requests will produce the same result: | ||
|
|
||
| ```json | ||
| GET /example-phone/_analyze | ||
| { | ||
| "analyzer" : "phone-ch", | ||
| "text" : "+41 60 555 12 34" | ||
| } | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|
|
||
| ```json | ||
| GET /example-phone/_analyze | ||
| { | ||
| "analyzer" : "phone", | ||
| "text" : "+41 60 555 12 34" | ||
| } | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|
|
||
| The response contains the generated n-grams: | ||
|
|
||
| ```json | ||
| ["+41 60 555 12 34", "6055512", "41605551", "416055512", "6055", "41605551234", ...] | ||
rursprung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| However, if you specify the phone number without the international calling prefix `+` (either by using `0041` or omitting | ||
| the international calling prefix altogether), then only the analyzer configured with the correct phone region can parse the number: | ||
|
|
||
| ```json | ||
| GET /example-phone/_analyze | ||
| { | ||
| "analyzer" : "phone-ch", | ||
| "text" : "060 555 12 34" | ||
| } | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|
|
||
| ## The phone-search analyzer | ||
|
|
||
| In contrast, the `phone-search` analyzer does not create n-grams and only issues some basic tokens. For example, send the following request and specify the `phone-search` analyzer: | ||
|
|
||
| ```json | ||
| GET /example-phone/_analyze | ||
| { | ||
| "analyzer" : "phone-search", | ||
| "text" : "+41 60 555 12 34" | ||
| } | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|
|
||
| The response contains the following tokens: | ||
|
|
||
| ```json | ||
| ["+41 60 555 12 34", "41 60 555 12 34", "41605551234", "605551234", "41"] | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a native speaker, but i'd have expected "[..] the list [..]" here? (it probably was me who wrote it like this in the first place? 🫣)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rursprung Either way is correct 😄