Skip to content

Don't index so many saved object fields #43673

Closed

Description

Update 29 June 2020

With 7.9 currently having ~960 fields we're fast approaching the 1000 field default limit. Please audit your plugins mappings and remove any unnecessary fields. Link from your PR back to this issue and mark your plugin's task as complete once the PR has been merged.

Removing fields

Setting index:false and doc_values:false removes some of the overhead of a field, but doesn't reduce the field count. To reduce the field count fields need to be removed from the mappings completely. This can be done by specifying dynamic: false on any level of your mappings.

For example, the following diff will remove three fields from the field count. The removed fields can still be stored in the Saved Object type but searching and aggregation is only possible on the timestamp field. Note: this change also removes any validation on Elasticsearch, which will allow saved objects with unknown attributes to be saved. Because of this we recommend by starting only with low-risk saved object types like telemetry data.

--- a/src/plugins/kibana_usage_collection/server/collectors/application_usage/saved_objects_types.ts
+++ b/src/plugins/kibana_usage_collection/server/collectors/application_usage/saved_objects_types.ts
@@ -47,11 +47,9 @@ export function registerMappings(registerType: SavedObjectsServiceSetup['registe
     hidden: false,
     namespaceType: 'agnostic',
     mappings: {
+      dynamic: false,
       properties: {
         timestamp: { type: 'date' },
-        appId: { type: 'keyword' },
-        minutesOnScreen: { type: 'float' },
-        numberOfClicks: { type: 'long' },
       },
     },
   });

You can use the following command to count the amount of fields to do a before/after comparison (requires brew install jq):

curl -X GET "elastic:changeme@localhost:9200/.kibana/_field_caps?fields=*&pretty=true" |  jq '.fields|length'

Plugins:

  • plugins/dashboard @elastic/kibana-app 🚧 @timroes
  • plugins/data @elastic/kibana-app-arch
  • plugins/expressions @elastic/kibana-app-arch
  • plugins/home @elastic/kibana-core-ui
  • plugins/kibana_usage_collection @elastic/kibana-telemetry (Reduce SavedObjects mappings for Application Usage #70475)
  • plugins/saved_objects_management @elastic/kibana-platform
  • plugins/share @elastic/kibana-app-arch
  • plugins/telemetry @elastic/kibana-telemetry
  • [ ] plugins/timelion @elastic/kibana-app @flash1293
  • plugins/visualizations @elastic/kibana-app
  • xpack/plugins/actions @elastic/kibana-alerting-services
  • xpack/plugins/alerting_builtins @elastic/kibana-alerting-services
  • xpack/plugins/apm @elastic/apm-ui ([APM] Improvements to data telemetry #70524)
  • xpack/plugins/canvas @elastic/kibana-canvas
  • xpack/plugins/case @elastic/siem @elastic/endpoint-app-team
  • xpack/plugins/file_upload @elastic/kibana-gis
  • xpack/plugins/graph @elastic/kibana-app @flash1293
  • xpack/plugins/infra @elastic/logs-metrics-ui
  • xpack/plugins/ingest_manager @elastic/ingest-management
  • xpack/plugins/lens @elastic/kibana-app
  • xpack/plugins/lists @elastic/siem @elastic/endpoint-app-team
  • xpack/plugins/maps @elastic/kibana-gis
  • xpack/plugins/monitoring @elastic/stack-monitoring-ui
  • xpack/plugins/security_solution @elastic/siem @elastic/endpoint-app-team
  • xpack/plugins/spaces @elastic/kibana-security
  • xpack/plugins/task_manager @elastic/kibana-alerting-services Task Manager does not use the .kibana index
  • xpack/plugins/upgrade_assistant @elastic/es-ui (Remove fields from UA mappings which don't need to be searchable #64547)
  • xpack/plugins/uptime @elastic/uptime [Uptime] Stop indexing saved object fields #72782

Original issue

Looking at the current mapping for a lot of our saved objects we're indexing a terrible amount of unnecessary fields, i.e. fields we know we'll never want to search through or filter over. Indexing those will just waste some more heap in Elasticsearch, if the field is unnecessary analyzed waste a couple of milliseconds on every insert and thus every migration. We even use a lot of text fields in places where we store stringified JSON which doesn't make any sense, since the analyzer won't end up with anything meaningful here.

This is not a huge problem, since the .kibana index is rather small usually, and also a lot of those JSON fields might be over the default ignore_above value of 256 and thus not indexed in most documents. Despite not being a huge problem I discussed this with @joshdover @tylersmalley and @rudolf and we agreed, that we should not waste Heap and indexing performance on fields we know we'll never need indexed.

As the field count on .kibana is approaching the default limit of 1000 fields we need to urgently evaluate whether or not all fields are really necessary for performing queries or filters.

Mapping recommendations

Here are a couple of general recommendations for how the mappings of a saved object should look:

type=text only for full text search on real text

A field with type text in the mapping will be analyzed and indexed. This makes sense only for fields we know we want to do full text search on, e.g. the title or description of a field. If you don't need the field value analyzed for full text search, don't index the field (see below) or use keyword with an appropriate ignore_above as a type instead. Good examples for a proper keyword field would be the visType or language of a query.

Don't index if not needed

Especially with keyword fields, we very often index a field without thinking about it (because it's the default option). If we know we'll never need to aggregate over that field or query for that field, but just have it available when retrieving the saved object, set index: false and doc_values: false (unless it's a text or annotated_text field) in the mapping for that field.

A couple of examples where it might make sense to have a (keyword) field indexed:

  • visType: we might want to filter on that later and thus need to be able to query by that field
  • language (of a query): even though we might never want to expose that in the UI, we might want to aggregate that field for telemetry data

A couple of examples where indexing doesn't make much sense:

  • expression (the "canvas" expression of a visualization): It doesn't make any sense filtering on the complex expression as a whole, neither aggregate over it. If we would want to build telemetry, we would anyway need to look at each document individually and e.g. parse it and count the containing functions.

JSON fields

We have a couple of places where we use a keyword field (often even indexed) to store some JSON object, like the configuration of a visualization, or the state of a dashboard. As a first step, these fields should be set to index: false.

As a further optimization this data can be saved as a field of type object with enabled: false. That way the content of that field will simply be ignored by Elasticsearch, it won't be indexed or analyzed, but still returned as it was indexed (as JSON) in the saved object. This removes an unnecessary JSON.stringify and JSON.parse when saving/loading those objects. Note: this will require writing a migration function for your saved object and changing any consuming code, so this is not an immediate need, but rather something to work towards for 8.0.

Consider using type: 'flattened' (licence basic) if you need to search over many fields or an unknown amount of fields

Flattened types uses a single field for the entire object. It comes with some limitations but in many instances can significantly reduce the field count while still being able to search/aggregate over the fields inside the object.

Keep in mind, that using the flattened field type, will still index all data within this field. If you just need one specific sub-field aggregated/searchable, but the rest not, the above described dynamic: false approach (where the parent key is dynamic: false and just that one sub-field you need search/aggregation on would have an (indexed) typing) would be more preferable. Usage of flattened is mostly preferred, if you potentially need to search/aggregate through a larger amount of sub-fields.

What happens after I changed my plugins mappings?

If you switch a field from an indexed to a not-indexed state (e.g. with enabled: false or index: false), the migration system will automatically update the mappings when Kibana is upgraded, no further action is required. If your plugin has recently removed or renamed an entire Saved Object type, these old mappings might not have been cleaned up. Please reach out to @elastic/kibana-platform if you think this might be the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Feature:Saved ObjectsTeam:CoreCore services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etcrelease_note:skipSkip the PR/issue when compiling release notesv7.9.0v8.0.0

Type

No type

Projects

  • Status

    Done (7.13)

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions