Skip to content

Remove dimension limit for time series data streams #93564

Closed
@felixbarny

Description

@felixbarny

Description

Currently, there are several limits around the number of dimensions:

  • Dimension keys have a hard limit of 512b. Documents are rejected if this limit is reached.
  • Dimension values have a hard limit of 1024b. Documents are rejected if this limit is reached.
  • The _tsid consists of all dimension keys and values and has a hard limit of 32kb. Documents are rejected if this limit is reached.
  • To avoid rejecting documents at ingest time due to the hard limit on the _tsid, per default, only 16 fields can be marked as a dimension in the mapping. The limit can be increased with an index setting, however this can lead to document rejections if the hard limit for _tsid is reached.

This limit makes it difficult to adopt time series data streams for a couple of reasons:

  • Before onboarding a metric, integration developers need to carefully think about whether a field is a dimension or just a metadata/tag.
    This isn't always trivial as some metadata is only available in certain conditions (when the application is running on k8s or on cloud). If we over-index and mark too many fields as dimensions, we risk hitting the limit. If we mark too few fields as dimensions it leads to document rejection when trying to index multiple documents with the same timestamp that end up having the same _tsid. It's a fairly labor-intensive and error-prone process to properly mark the right set of fields as dimensions.
  • It prevents the ingestion of ad-hoc metrics that have an unknown up-front schema.
    We'll want to provide users of metric libraries like Micrometer or the OpenTelemetry metrics SDK with an easy way to add new metrics, without previously having to change the schema in ES. Metric libraries usually don't differ between dimensions and metadata. There's typically only a way to set the metric name, attributes (aka labels, tags, dimensions), and a value. So we'll need to map all dynamic labels as dimensions. The metric limit gets in the way of that.
  • Other TSDBs don't have such a limit.
    This will make it harder to move from other TSDBs to Elasticsearch.

I don't want to go too much into implementation details here but we had discussions about potentially turning the _tsid into a hash which would enable to completely remove any limits on the number of dimensions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions