-
Couldn't load subscription status.
- Fork 621
Add documentation for derived source in source field metadata #10674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0ad2e78
e13bbbe
3c9351b
1611075
5cb1d67
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,7 +25,7 @@ PUT sample-index1 | |
| ``` | ||
| {% include copy-curl.html %} | ||
|
|
||
| Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. | ||
| Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. To support these features without storing the `_source` field explicitly, [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source) can be used without compromising storage constraints. | ||
| {: .warning} | ||
|
|
||
| ## Including or excluding fields | ||
|
|
@@ -52,3 +52,73 @@ PUT logs | |
| {% include copy-curl.html %} | ||
|
|
||
| These fields are not stored in the `_source`, but you can still search them because the data remains indexed. | ||
|
|
||
| ## Derived source | ||
|
|
||
| OpenSearch stores each ingested document in the `_source` field and also indexes individual fields for search. The `_source` field can consume significant storage space. To reduce storage use, you can configure OpenSearch to skip storing the `_source` field and instead reconstruct it dynamically when needed, for example, during `search`, `get`, `mget`, `reindex`, or `update` operations. | ||
|
|
||
| To enable derived source, configure the `derived_source` index-level setting: | ||
|
|
||
|
|
||
| ```json | ||
| PUT sample-index1 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we also call this out in the index settings page? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added both settings on main index setting page |
||
| { | ||
| "settings": { | ||
| "index": { | ||
| "derived_source": { | ||
| "enabled": true | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| {% include copy-curl.html %} | ||
|
|
||
| While skipping the `_source` field can significantly reduce storage requirements, dynamically deriving the source is generally slower than reading a stored `_source`. To avoid this overhead during search queries, do not request the `_source` field when it's not needed. You can do this by setting the `size` parameter, which controls the number of documents returned. | ||
|
|
||
| For real-time reads using the [Get Document API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/get-documents/) or [Multi-get Documents API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/multi-get/), which are served from the translog until [`refresh`]({{site.url}}{{site.baseurl}}/api-reference/index-apis/refresh/) happens, performance can be slower when using a derived source. This is because the document must first be ingested temporarily before the source can be reconstructed. You can avoid this additional latency by using an index-level `derived_source.translog` setting that disables generating a derived source during translog reads: | ||
|
|
||
| ```json | ||
| PUT sample-index1 | ||
| { | ||
| "settings": { | ||
| "index": { | ||
| "derived_source": { | ||
| "translog": { | ||
| "enabled": false | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| If this setting is used, you may notice differences in the `_source` content for a document depending on whether it is still in the translog or has been written to a segment. | ||
|
|
||
| ### Supported fields and parameters | ||
|
|
||
| Derived source uses [`doc_values`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/) and [`stored_fields`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/store/) to reconstruct the document at query time. Because of the implementation of `doc_values`, the dynamically generated `_source` may differ in format or precision from the original ingested document. | ||
|
|
||
| Derived source supports the following field types without requiring any changes to field mappings (with some [limitations](#limitations)): | ||
|
|
||
| - [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/boolean/) | ||
| - [`byte`, `double`, `float`, `half_float`, `integer`, `long`, `short`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/) | ||
| - [`date`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/) | ||
| - [`date-nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/) | ||
| - [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/) | ||
| - [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/) | ||
| - [`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/) | ||
| - [`unsigned_long`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/unsigned-long/) | ||
| - [`scaled_float`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/) | ||
| - [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you call out how text should be supported? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the section on how text field is supported. |
||
| - [`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/) | ||
|
|
||
| For a [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field with derived source enabled, the field value is stored as a stored field by default. You do not need to set the `store` mapping parameter to `true`. | ||
| {: .note} | ||
|
|
||
| ### Limitations | ||
|
|
||
| Derived source does not support the following fields: | ||
|
|
||
| - Fields containing [`copy_to`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/copy-to/) parameters. | ||
| - [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) and [`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/) fields that define either the [`ignore_above`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/ignore-above/) or [`normalizer`]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) parameters. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a call-out to derived source in line 33, and point to this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linked the derived source in warning section.