Description
openedon Dec 13, 2021
Fleet and Elastic Agent users need a mechanism to customize how their data is being ingested, mapped, and stored that will be preserved across Stack and integration package upgrades. This issue outlines how we plan to structure index and component templates to support user customizations to Fleet-managed data streams, on a namespace-level of granularity.
Scope
The goal is provide a future-proof naming scheme and template structure that will allow users to add the following customizations to Fleet-managed data streams:
- Mappings (additive and non-additive)
- ILM policy
- Number of replicas, primaries, and routing shards
- Refresh interval
- Other general index settings (query.*, etc.)
This scheme does not allow customizing:
- Ingest pipelines
- Elasticsearch does not support more an arbitrary number of ingest node pipelines so the only existing way to customize ingest pipelines is to modify the one installed by Fleet which will not be preserved across package upgrades. See Specify multiple ingest pipelines for a data stream elasticsearch#61185
- If Elasticsearch were to add a
default_pipelines
setting which is an array of pipelines, it’s likely that this customization scheme would be compatible.
Design
Existing scheme (as of 8.2)
The existing scheme that we use in Fleet today installs a single index template for each dataset in a package that matches data streams for all namespaces. It has the following properties:
- name:
<type>-<dataset>
- matches:
<type>-<dataset>-*
- priority: 200
- Component templates (highest to lowest precedence):
.fleet_agent_id_verification-1
- final pipeline & mappings for agent_id verification (can optionally be disabled in kibana.yml)
.fleet_globals-1
- global settings and mappings applied to every data stream (eg.
event.ingested
)
- global settings and mappings applied to every data stream (eg.
<type>-<dataset>@custom
- user-defined customizations (settings and/or mappings - for all namespaces)
<type>-<dataset>@package
- package-defined mappings and settings
New proposed scheme
In order to preserve user customizations across upgrades, it’s important that we store their overrides in a separate component template that Fleet can copy over to new versions of the package’s index template. In this updated scheme, we will add an additional index template that is namespace-specific and of higher priority than the base template:
- name:
<type>-<dataset>-<namespace>
- matches:
<type>-<dataset>-<namespace>
- priority: 250
- Components (highest to lowest precedence):
.fleet_agent_id_verification-1
- final pipeline & mappings for agent_id verification (can optionally be disabled in kibana.yml)
.fleet_globals-1
- global settings and mappings applied to every data stream (eg.
event.ingested
)
- global settings and mappings applied to every data stream (eg.
<type>-<dataset>-<namespace>@custom
- namespace-specific user-defined customizations
<type>-<dataset>@custom
- user-defined customizations (settings and/or mappings - for all namespaces)
<type>-<dataset>@package
- package-defined mappings and settings
During package upgrades, Fleet would preserve the contents of both the ‘global’ custom template (<type>-<dataset>@custom
) and the namespace-specific ones (<type>-<dataset>-<namespace>@custom
) while replacing all of the other templates (including the index template). This would allow the user’s customizations to be preserved and to override any package-specific settings and mappings.
Like the ‘global’ custom template we offer today, we would allow users to directly edit the namespace-specific templates with arbitrary settings and mappings in order to override those supplied by the package. We would also use the template to store customizations that we plan to support directly in the UI (eg. setting the ILM policy).
We will not remove the base index template we install today that matches a wildcard namespace (<type>-<dataset>-*
) because Elastic Agent standalone requires this template to be installed.
Changing a namespace for an existing integration policy
If a user edits an existing integration policy to point to a new namespace, we can offer them the option to copy over any customizations from the previous namespace’s <type>-<dataset>-<namespace>@custom
template. We would not delete the old templates since this could affect the existing data streams and indices or any standalone agents ingesting data into this namespace.
As a separate enhancement, we could offer a ‘cleanup’ UI either in Fleet or Stack Management that shows index templates that are not currently in use.
Customize API
In order to facilitate automated usage of this scheme, we should provide a high-level package customization Kibana Fleet API in Kibana that allows admins to make customizations without worrying about the low-level details of how the templates are configured, whether or not a data stream needs to be rolled over, or how to apply the setting changes retroactively to backing indices. The main usecase for this is for standalone Agent usage. This may also be used to power in-app features for making customizations (eg. setting the ILM policy).
# Write custom settings and mappings to all namespaces
# Writes to `<type>-<dataset>@custom` templates
PUT /api/fleet/epm/nginx/customize
{
"settings": { … },
"mappings": { … },
}
# Add or update a namespace for an integration, creates the namespace-specific templates
# Write custom settings and mappings to namespace
# Writes to `<type>-<dataset>-<namespace>@custom` templates
PUT /api/fleet/epm/nginx/customize/namespace/foo
{
"settings": { … },
"mappings": { … },
}
# Removes a namespace, deleting namespace-specific templates
# Does not delete data indices or data streams
DELETE /api/fleet/epm/nginx/customize/namespace/foo
All of the other APIs should also create these namespaces automatically. For example, if an integration policy is added for the nginx
package on the bar
namespace, the POST /api/fleet/package_policies
API should also create the appropriate namespace templates if they don't already exist.
There are additional use cases for this API outside of index templates, for example there have been other requests for namespace-specific transforms. We should design this API to accommodate future use cases easily.
Upgrade considerations
For packages that were installed before this scheme was introduced, Fleet should automatically add the appropriate namespace-specific index and component templates in order to facilitate a consistent experience for end-users. See #121099
For upgrades where any @custom
components already exist, they should be retained and not removed so that they are still present once the new package version is installed. This means existing templates should also not get overwritten.
Open questions
- When should namespace-specific templates be deleted when using the product?
- If we're going to support a generic API that doesn't require integration or agent policies to point to namespaces, then I don't think we can do any automated cleanup else we could delete configuration that is in use by a standalone agent.
- There are separate
@custom
component templates for each data stream in an integration package, while the API design proposed here would apply to the entire integration. This can present problems if a user manually edits a single component template so the data streams are not in sync, for example the source of truth is now ambiguous. How would we solve this?- Have a single, managed component template that is used for customizations that apply to the entire integration. Leave the
@custom
templates unmanaged and never edit them. (@joshdover votes for this one) - Store the customizations set on this API in a Kibana Saved Object and use this as the source of truth. Manual user edits to
@custom
templates would then be merged in after settings from this SO. This would allow manual additions and modifications to@custom
templates to be preserved, however deletes would be lost.
- Have a single, managed component template that is used for customizations that apply to the entire integration. Leave the
- How should namespace renames work? If a user renames the namespace field on an integration policy or agent policy, should we attempt to copy any customizations on the previous namespace when creating the new namespace? If not or if the new namespace already exists, should we warn the user that settings/mappings are going to change for this data?
- How do we handle when a new dataset is added for an existing package? Should we keep a copy of any custom settings/mappings in a Saved Object and automatically apply them to all datasets during package upgrades?
- Should the management APIs allow changes to mappings? If so when and how would the user expect these to take effect e.g would a rollover be automatic?