Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 71 additions & 1 deletion _field-types/metadata-fields/source.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ PUT sample-index1
```
{% include copy-curl.html %}

Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document.
Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. To support these features without storing the `_source` field explicitly, [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source) can be used without compromising storage constraints.
{: .warning}

## Including or excluding fields
Expand All @@ -52,3 +52,73 @@ PUT logs
{% include copy-curl.html %}

These fields are not stored in the `_source`, but you can still search them because the data remains indexed.

## Derived source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a call-out to derived source in line 33, and point to this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked the derived source in warning section.


OpenSearch stores each ingested document in the `_source` field and also indexes individual fields for search. The `_source` field can consume significant storage space. To reduce storage use, you can configure OpenSearch to skip storing the `_source` field and instead reconstruct it dynamically when needed, for example, during `search`, `get`, `mget`, `reindex`, or `update` operations.

To enable derived source, configure the `derived_source` index-level setting:


```json
PUT sample-index1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also call this out in the index settings page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added both settings on main index setting page

{
"settings": {
"index": {
"derived_source": {
"enabled": true
}
}
}
}
```
{% include copy-curl.html %}

While skipping the `_source` field can significantly reduce storage requirements, dynamically deriving the source is generally slower than reading a stored `_source`. To avoid this overhead during search queries, do not request the `_source` field when it's not needed. You can do this by setting the `size` parameter, which controls the number of documents returned.

For real-time reads using the [Get Document API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/get-documents/) or [Multi-get Documents API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/multi-get/), which are served from the translog until [`refresh`]({{site.url}}{{site.baseurl}}/api-reference/index-apis/refresh/) happens, performance can be slower when using a derived source. This is because the document must first be ingested temporarily before the source can be reconstructed. You can avoid this additional latency by using an index-level `derived_source.translog` setting that disables generating a derived source during translog reads:

```json
PUT sample-index1
{
"settings": {
"index": {
"derived_source": {
"translog": {
"enabled": false
}
}
}
}
}
```

If this setting is used, you may notice differences in the `_source` content for a document depending on whether it is still in the translog or has been written to a segment.

### Supported fields and parameters

Derived source uses [`doc_values`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/) and [`stored_fields`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/store/) to reconstruct the document at query time. Because of the implementation of `doc_values`, the dynamically generated `_source` may differ in format or precision from the original ingested document.

Derived source supports the following field types without requiring any changes to field mappings (with some [limitations](#limitations)):

- [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/boolean/)
- [`byte`, `double`, `float`, `half_float`, `integer`, `long`, `short`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/)
- [`date`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/)
- [`date-nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/)
- [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/)
- [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/)
- [`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/)
- [`unsigned_long`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/unsigned-long/)
- [`scaled_float`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/)
- [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call out how text should be supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the section on how text field is supported.

- [`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/)

For a [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field with derived source enabled, the field value is stored as a stored field by default. You do not need to set the `store` mapping parameter to `true`.
{: .note}

### Limitations

Derived source does not support the following fields:

- Fields containing [`copy_to`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/copy-to/) parameters.
- [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) and [`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/) fields that define either the [`ignore_above`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/ignore-above/) or [`normalizer`]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) parameters.
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ For `zstd`, `zstd_no_dict`, `qat_lz4`, and `qat_deflate`, you can specify the co

- `index.append_only.enabled` (Boolean): Set to `true` to prevent any updates to documents in the index. Default is `false`.

- `index.derived_source.enabled` (Boolean): Set to `true` to dynamically generate the source without explicitly storing the `_source` field, which can optimize storage. Default is `false`. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source).

### Updating a static index setting

You can update a static index setting only on a closed index. The following example demonstrates updating the index codec setting.
Expand Down Expand Up @@ -269,6 +271,8 @@ OpenSearch supports the following dynamic index-level index settings:

- `index.routing.allocation.total_primary_shards_per_node` (Integer): The maximum number of primary shards from a single index that can be allocated to a single node. This setting is applicable only for remote-backed clusters. Default is `-1` (unlimited). Helps control per-index primary shard distribution across nodes by limiting the number of primary shards per node. Use with caution because primary shards from this index may remain unallocated if nodes reach their configured limits.

- `index.derived_source.translog.enabled` (Boolean): Controls how documents are read from the translog for an index with derived source enabled. Defaults to the `index.derived_source.enabled` value. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source).

### Updating a dynamic index setting

You can update a dynamic index setting at any time through the API. For example, to update the refresh interval, use the following request:
Expand Down
Loading