Skip to content

Commit 65281f2

Browse files
committed
[DOC] Add new documentation for IP2Geo (#4998)
* Approved through tech, doc, and editorial Signed-off-by: Melissa Vagi <vagimeli@amazon.com> * Publish documentation Signed-off-by: Melissa Vagi <vagimeli@amazon.com> --------- Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
1 parent 0d88de9 commit 65281f2

File tree

1 file changed

+243
-0
lines changed
  • _api-reference/ingest-apis/processors

1 file changed

+243
-0
lines changed
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
---
2+
layout: default
3+
title: IP2Geo
4+
parent: Ingest processors
5+
grand_parent: Ingest APIs
6+
nav_order: 130
7+
---
8+
9+
# IP2Geo
10+
Introduced 2.10
11+
{: .label .label-purple }
12+
13+
The `ip2geo` processor adds information about the geographical location of an IPv4 or IPv6 address. The `ip2geo` processor uses IP geolocation (GeoIP) data from an external endpoint and therefore requires an additional component, `datasource`, that defines from where to download GeoIP data and how frequently to update the data.
14+
15+
{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/info-icon.png" class="inline-icon" alt="info icon"/>{:/} **NOTE**<br>The `ip2geo` processor maintains the GeoIP data mapping in system indexes. The GeoIP mapping is retrieved from these indexes during data ingestion to perform the IP-to-geolocation conversion on the incoming data. For optimal performance, it is preferable to have a node with both ingest and data roles, as this configuration avoids internode calls reducing latency. Also, as the `ip2geo` processor searches GeoIP mapping data from the indexes, search performance is impacted.
16+
{: .note}
17+
18+
## Getting started
19+
20+
To get started with the `ip2geo` processor, the `opensearch-geospatial` plugin must be installed. See [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) to learn more.
21+
22+
## Cluster settings
23+
24+
The IP2Geo data source and `ip2geo` processor node settings are listed in the following table.
25+
26+
| Key | Description | Default |
27+
|--------------------|-------------|---------|
28+
| plugins.geospatial.ip2geo.datasource.endpoint | Default endpoint for creating the data source API. | Defaults to https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json. |
29+
| plugins.geospatial.ip2geo.datasource.update_interval_in_days | Default update interval for creating the data source API. | Defaults to 3. |
30+
| plugins.geospatial.ip2geo.datasource.batch_size | Maximum number of documents to ingest in a bulk request during the IP2Geo data source creation process. | Defaults to 10,000. |
31+
| plugins.geospatial.ip2geo.processor.cache_size | Maximum number of results that can be cached. There is only one cache used for all IP2Geo processors in each node | Defaults to 1,000. |
32+
|-------------------|-------------|---------|
33+
34+
## Creating the IP2Geo data source
35+
36+
Before creating the pipeline that uses the `ip2geo` processor, create the IP2Geo data source. The data source defines the endpoint value that will download GeoIP data and specifies the update interval.
37+
38+
OpenSearch provides the following endpoints for GeoLite2 City, GeoLite2 Country, and GeoLite2 ASN databases from [MaxMind](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data), which is shared under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license:
39+
40+
* GeoLite2 City: https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json
41+
* GeoLite2 Country: https://geoip.maps.opensearch.org/v1/geolite2-country/manifest.json
42+
* GeoLite2 ASN: https://geoip.maps.opensearch.org/v1/geolite2-asn/manifest.json
43+
44+
If an OpenSearch cluster cannot update a data source from the endpoints within 30 days, the cluster does not add GeoIP data to the documents and instead adds `"error":"ip2geo_data_expired"`.
45+
46+
### Data source options
47+
48+
The following table lists the data source options for the `ip2geo` processor.
49+
50+
| Name | Required | Default | Description |
51+
|------|----------|---------|-------------|
52+
| `endpoint` | Optional | https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json | The endpoint that downloads the GeoIP data. |
53+
| `update_interval_in_days` | Optional | 3 | How frequently, in days, the GeoIP data is updated. The minimum value is 1. |
54+
55+
To create an IP2Geo data source, run the following query:
56+
57+
```json
58+
PUT /_plugins/geospatial/ip2geo/datasource/my-datasource
59+
{
60+
"endpoint" : "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json",
61+
"update_interval_in_days" : 3
62+
}
63+
```
64+
{% include copy-curl.html %}
65+
66+
A `true` response means that the request was successful and that the server was able to process the request. A `false` response indicates that you should check the request to make sure it is valid, check the URL to make sure it is correct, or try again.
67+
68+
### Sending a GET request
69+
70+
To get information about one or more IP2Geo data sources, send a GET request:
71+
72+
```json
73+
GET /_plugins/geospatial/ip2geo/datasource/my-datasource
74+
```
75+
{% include copy-curl.html %}
76+
77+
You'll receive the following response:
78+
79+
```json
80+
{
81+
"datasources": [
82+
{
83+
"name": "my-datasource",
84+
"state": "AVAILABLE",
85+
"endpoint": "https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json",
86+
"update_interval_in_days": 3,
87+
"next_update_at_in_epoch_millis": 1685125612373,
88+
"database": {
89+
"provider": "maxmind",
90+
"sha256_hash": "0SmTZgtTRjWa5lXR+XFCqrZcT495jL5XUcJlpMj0uEA=",
91+
"updated_at_in_epoch_millis": 1684429230000,
92+
"valid_for_in_days": 30,
93+
"fields": [
94+
"country_iso_code",
95+
"country_name",
96+
"continent_name",
97+
"region_iso_code",
98+
"region_name",
99+
"city_name",
100+
"time_zone",
101+
"location"
102+
]
103+
},
104+
"update_stats": {
105+
"last_succeeded_at_in_epoch_millis": 1684866730192,
106+
"last_processing_time_in_millis": 317640,
107+
"last_failed_at_in_epoch_millis": 1684866730492,
108+
"last_skipped_at_in_epoch_millis": 1684866730292
109+
}
110+
}
111+
]
112+
}
113+
```
114+
115+
### Updating an IP2Geo data source
116+
117+
See the Creating the IP2Geo data source section for a list of endpoints and request field descriptions.
118+
119+
To update the date source, run the following query:
120+
121+
```json
122+
PUT /_plugins/geospatial/ip2geo/datasource/my-datasource/_settings
123+
{
124+
"endpoint": https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json,
125+
"update_interval_in_days": 10
126+
}
127+
```
128+
{% include copy-curl.html %}
129+
130+
### Deleting the IP2Geo data source
131+
132+
To delete the IP2Geo data source, you must first delete all processors associated with the data source. Otherwise, the request fails.
133+
134+
To delete the data source, run the following query:
135+
136+
```json
137+
DELETE /_plugins/geospatial/ip2geo/datasource/my-datasource
138+
```
139+
{% include copy-curl.html %}
140+
141+
## Creating the pipeline
142+
143+
Once the data source is created, you can create the pipeline. The following is the syntax for the `ip2geo` processor:
144+
145+
```json
146+
{
147+
"ip2geo": {
148+
"field":"ip",
149+
"datasource":"my-datasource"
150+
}
151+
}
152+
```
153+
{% include copy-curl.html %}
154+
155+
### Configuration parameters
156+
157+
The following table lists the required and optional parameters for the `ip2geo` processor.
158+
159+
| Name | Required | Default | Description |
160+
|------|----------|---------|-------------|
161+
| `datasource` | Required | - | The data source name to use to retrieve geographical information. |
162+
| `field` | Required | - | The field that contains the IP address for geographical lookup. |
163+
| `ignore_missing` | Optional | false | If set to `true`, the processor does not modify the document if the field does not exist or is `null`. Default is `false`. |
164+
| `properties` | Optional | All fields in `datasource` | The field that controls which properties are added to `target_field` from `datasource`. |
165+
| `target_field` | Optional | ip2geo | The field that contains the geographical information retrieved from the data source. |
166+
167+
## Using the processor
168+
169+
Follow these steps to use the processor in a pipeline.
170+
171+
**Step 1: Create a pipeline.**
172+
173+
The following query creates a pipeline, named `my-pipeline`, that converts the IP address to geographical information:
174+
175+
```json
176+
PUT /_ingest/pipeline/my-pipeline
177+
{
178+
"description":"convert ip to geo",
179+
"processors":[
180+
{
181+
"ip2geo":{
182+
"field":"ip",
183+
"datasource":"my-datasource"
184+
}
185+
}
186+
]
187+
}
188+
```
189+
{% include copy-curl.html %}
190+
191+
**Step 2 (Optional): Test the pipeline.**
192+
193+
{::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/icons/info-icon.png" class="inline-icon" alt="info icon"/>{:/} **NOTE**<br>It is recommended that you test your pipeline before you ingest documents.
194+
{: .note}
195+
196+
To test the pipeline, run the following query:
197+
198+
```json
199+
POST _ingest/pipeline/my-id/_simulate
200+
{
201+
"docs": [
202+
{
203+
"_index":"my-index",
204+
"_id":"my-id",
205+
"_source":{
206+
"my_ip_field":"172.0.0.1",
207+
"ip2geo":{
208+
"continent_name":"North America",
209+
"region_iso_code":"AL",
210+
"city_name":"Calera",
211+
"country_iso_code":"US",
212+
"country_name":"United States",
213+
"region_name":"Alabama",
214+
"location":"33.1063,-86.7583",
215+
"time_zone":"America/Chicago"
216+
}
217+
}
218+
}
219+
]
220+
}
221+
```
222+
{% include copy-curl.html %}
223+
224+
**Step 3: Ingest a document.**
225+
226+
The following query ingests a document into an index named `my-index`:
227+
228+
```json
229+
PUT /my-index/_doc/my-id?pipeline=ip2geo
230+
{
231+
"ip": "172.0.0.1"
232+
}
233+
```
234+
{% include copy-curl.html %}
235+
236+
**Step 4 (Optional): Retrieve the document.**
237+
238+
To retrieve the document, run the following query:
239+
240+
```json
241+
GET /my-index/_doc/my-id
242+
```
243+
{% include copy-curl.html %}

0 commit comments

Comments
 (0)