Optimize usage calculation in ILM policies retrieval API #106953

nielsbauman · 2024-04-01T11:35:20Z

Optimize calculating the usage of ILM policies in the GET _ilm/policy and GET _ilm/policy/<policy_id> endpoints. I loaded some dummy data into my single-node locally-running ES cluster using the following script (numbers are inspired by a customer who was facing timeouts in Kibana due to this endpoint being slow):

Bash snippet

#!/bin/bash

ES_URL="${ES_URL:-http://localhost:9200}"
NR_POLICIES=42
NR_COMPONENT_TEMPLATES=700
NR_INDEX_TEMPLATES=8000
NR_DATA_STREAMS=1000
NR_INDICES=1000

for (( i=0; i<NR_POLICIES; i++ )); do
  curl -s -X PUT "$ES_URL/_ilm/policy/policy-$i" -H 'Content-Type: application/json' -d '
  {
    "policy": {
      "phases": {}
    }
  }
  ' > /dev/null
done


for (( i=0; i<NR_COMPONENT_TEMPLATES; i++ )); do
  curl -s -X PUT "$ES_URL/_component_template/component-template-$i" -H 'Content-Type: application/json' -d '
  {"template": {}}
  ' > /dev/null
done

for (( i=0; i<NR_INDEX_TEMPLATES; i++ )); do
  component_template=$((RANDOM % NR_COMPONENT_TEMPLATES))
  curl -s -X PUT "$ES_URL/_index_template/index-template-$i" -H 'Content-Type: application/json' -d "
  {
    \"index_patterns\": [\"index-pattern-$i\"],
    \"template\": {},
    \"composed_of\": [\"component-template-$component_template\"],
    \"data_stream\": {}
  }
  " > /dev/null
done

for (( i=0; i<NR_DATA_STREAMS; i++ )); do
  curl -s -X PUT "$ES_URL/_data_stream/index-pattern-$i" > /dev/null
done

for (( i=0; i<NR_INDICES; i++ )); do
  curl -s -X PUT "$ES_URL/index-$i" > /dev/null
done

I also played around with some aspects of the script such as making the templates use policies or not, as that represents different scenarios that users might face.

I generated a flamegraph on main, which showed that the MetadataIndexTemplateService.findV2Template(...) call was by far the most expensive.

Flamegraph

After some experimentation, I ended up extracting a separate class that pre-computes some parts on initialization (i.e. only once per request) and then uses those pre-computed parts when calculating the usage for an individual policy. This is of course a trade-off between memory usage and run-time performance. But, I think the added memory usage (which shouldn't be crazy much) is worth the significant speed improvement here.

Fixes #105773

elasticsearchmachine · 2024-04-01T11:35:43Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2024-04-01T11:35:44Z

Hi @nielsbauman, I've created a changelog YAML for you.

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUtils.java

gmarouli · 2024-04-03T10:56:01Z

I really like the idea of reducing the number of templates we are going through to speed up things. However, I am afraid this optimisation is not correct, I created a test that demonstrates the bug:

        {
            // Test when a data stream does not use the policy anymore because of a higher template
            Metadata.Builder mBuilder = Metadata.builder()
                   .put(IndexMetadata.builder("myindex").settings(indexSettings(IndexVersion.current(), 1, 0).put(LifecycleSettings.LIFECYCLE_NAME, "mypolicy")))
                    .putCustom(
                            IndexLifecycleMetadata.TYPE,
                            new IndexLifecycleMetadata(
                                    Map.of("mypolicy", LifecyclePolicyMetadataTests.createRandomPolicyMetadata("mypolicy")),
                                    OperationMode.RUNNING
                            )
                    )
                    .putCustom(
                            ComposableIndexTemplateMetadata.TYPE,
                            new ComposableIndexTemplateMetadata(
                                    Map.of(
                                            "mytemplate",
                                            ComposableIndexTemplate.builder()
                                                    .indexPatterns(Collections.singletonList("myds*"))
                                                    .template(
                                                            new Template(Settings.builder().put(LifecycleSettings.LIFECYCLE_NAME, "mypolicy").build(), null, null)
                                                    )
                                                    .dataStreamTemplate(new ComposableIndexTemplate.DataStreamTemplate(false, false))
                                                    .build(),
                                            "myhighertemplate",
                                            ComposableIndexTemplate.builder()
                                                    .indexPatterns(Collections.singletonList("myds"))
                                                    .dataStreamTemplate(new ComposableIndexTemplate.DataStreamTemplate(false, false))
                                                    .priority(1_000L)
                                                    .build()
                                    )
                            )
                    );
             mBuilder.put(DataStreamTestHelper.newInstance("myds", Collections.singletonList(mBuilder.get("myindex").getIndex())));

            // Test where policy exists and is used by an index, datastream, and template
            ClusterState state = ClusterState.builder(new ClusterName("mycluster")).metadata(mBuilder.build()).build();
            assertThat(
                    LifecyclePolicyUtils.calculateUsage(iner, state, "mypolicy"),
                    equalTo(
                            new ItemUsage(List.of("myindex"),List.of(), List.of("mytemplate"))
                    )
            );
        }
    }

In this test we see a data stream whose winner template does not have a policy but there is another candidate template with lower priority that has our policy. The optimisation wrongly reports that this policy is used by the data stream.

We can explore more ideas like this, for example:

What if we filter and sort only data stream templates? In the worst case scenario we make something slow even slower, but it might speed up some things in many other cases.

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

nielsbauman · 2024-05-04T15:58:53Z

...ugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculator.java

+    /** A map from policy name to list of data streams that use that policy. */
+    private final Map<String, List<String>> policyToDataStream;
+    /** A map from composable template name to the policy name it uses (or null) */
+    private final Map<String, String> templateToPolicy;


The templateToPolicy map is basically only there to save some MetadataIndexTemplateService.resolveSettings calls (as we have to loop over all templates in calculateUsage anyway), but resolving the settings seemed to be relatively expensive, so templateToPolicy basically serves as a cache for that.

nielsbauman · 2024-05-04T16:01:31Z

...ugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculator.java

+        List<String> indices = new ArrayList<>();
+        for (IndexMetadata indexMetadata : state.metadata().indices().values()) {
+            if (policyName.equals(indexMetadata.getLifecyclePolicyName())) {
+                indices.add(indexMetadata.getIndex().getName());


I could pre-compute a map of policy to indices as well, but I'm not sure the memory vs. speed trade-off is worth it there.

On second thought, since we have to build this list for every policy anyway, pre-computing these lists and putting them in a map shouldn't be too much additional memory overhead. I'll wait till someone has done a first review before making more changes (in case my whole approach is off).

And the same goes for the composable templates of course.

Let's think about this. You only need to keep a cache of what is necessary. Right?

For example:
Composable templates:
Cache the resolved templates that match our policies and their index patterns: template -> policy, probably you can also store policy -> templates.

Data streams
Go over the data streams like you already do, find the template for each data stream and check the cache template -> policy. You do not need to keep track of nulls anymore, since you have collected the relevant templates, if the template is not in your cache then the data stream shouldn't be either. So you create the policy -> data stream.

Indices
I do not see an issue with going over the indices during initialisation because from what I saw the information is pre-calculated and then serialised. So, here you create policy -> indices.

Then retrieving the data is just picking them up from the cache. Thoughts?

...core/src/test/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculatorTests.java

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

gmarouli · 2024-05-17T10:25:15Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

@@ -1050,29 +1051,24 @@ static Set<String> dataStreamsExclusivelyUsingTemplates(final ClusterState state
            .reduce(Sets::union)
            .orElse(Set.of());

+        // Filter and sort all composable templates in advance, to speed up template retrieval later on.
+        var templates = state.metadata()


So this confused me a little bit, both the name and the comment. It wasn't clear to me what is filtered and when we use the word templates, which templates do we mean. So, what if we rename:

templateNames to requestedTemplateNames, signalling that these was provided by the caller.

templates to restTemplateCandidates or otherTemplateCandidates, signalling that these are the rest, so not the ones provided by the user and candidates because isGlobalAndHasIndexHiddenSetting is filtering out the non candidates.

Names are up for debate but I hope I made clear the direction I am considering.

Good point, I've renamed the templates variable and updated the comments. Let me know if this is more clear.

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

gmarouli · 2024-05-17T10:34:53Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

        final List<Tuple<String, ComposableIndexTemplate>> candidates = new ArrayList<>();
-        for (Map.Entry<String, ComposableIndexTemplate> entry : metadata.templatesV2().entrySet()) {
+        outerLoop:


Hm..... I am not going to lie, seeing this does not give me a good feeling. Is this really the best alternative to structure the code?

I think I've found a much better alternative, see 293ff9d. Let me know whether you agree that this approach is better.

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

gmarouli

Apologies for the late review! Really happy to see this work @nielsbauman, this is awesome! Let me know what you think on the comments.

I like the direction that you have taken :)

gmarouli · 2024-08-29T10:18:43Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

+    public static String findV2Template(
+        Metadata metadata,
+        Collection<Map.Entry<String, ComposableIndexTemplate>> templates,
+        String indexName,
+        boolean isHidden,
+        boolean exitOnFirstMatch
+    ) {
+        final List<Tuple<String, ComposableIndexTemplate>> candidates = findV2CandidateTemplates(
+            templates,
+            indexName,
+            isHidden,
+            exitOnFirstMatch
+        );


If you like the idea of more explicit method names like I suggested in the previous comment. We could change the visibility of this method to private or package private depending on the testing requirements. This way this method won't be misused by setting exitOnFirstMatch with an unordered template list.

What do you think?

gmarouli · 2024-08-29T10:25:20Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

-                if (matched) {
-                    candidates.add(Tuple.tuple(name, template));
+            if (isHidden) {
+                final boolean hasMatchAllTemplate = template.indexPatterns().stream().anyMatch(Regex::isMatchAllPattern);


Nit: hasMatchAllTemplate I think this reads better as isMatchAllTemplate because it talks about the template itself. Otherwise it should be hasMatchAllPattern.

gmarouli · 2024-08-29T10:37:05Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

            .reduce(Sets::union)
            .orElse(Set.of());


Nit: I know this is not your code but since it's in my visual I wanted to add a small note.

I think this code is a bit more complex than it has to be.

.map(Set::copyOf) .reduce(Sets::union) .orElse(Set.of())

Effectively we want to add all patterns into a set, right? I think the following code is a bit more readable.

.flatMap(List::stream) .collect(Collectors.toSet())

What do you think?

gmarouli · 2024-08-29T11:00:44Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

     * Return an ordered list of the name (id) and composable index templates that would apply to an index. The first
     * one is the winner template that is applied to this index. In the event that no templates are matched,
-     * an empty list is returned.
+     * an empty list is returned. If <code>exitOnFirstMatch</code> is true, we return immediately after finding a match.
     */


I believe we need to elaborate a bit more on how to use this method. What are the trade-offs of exitOnFirstMatch. How it should be used etc.

gmarouli · 2024-08-29T11:09:15Z

...ugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculator.java

+            if (indexTemplate == null) {
+                continue;
+            }
+            Settings settings = MetadataIndexTemplateService.resolveSettings(state.metadata(), indexTemplate);


Before doing this, shouldn't we check if the template is already in templateToPolicy, then we do not have to resolve the settings again?

gmarouli · 2024-08-29T11:20:44Z

...core/src/test/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculatorTests.java

+            .build();
+        assertThat(
+            new LifecyclePolicyUsageCalculator(iner, state, List.of("mypolicy")).calculateUsage("mypolicy"),
+            equalTo(new ItemUsage(Collections.emptyList(), Collections.emptyList(), Collections.emptyList()))


Suggested change

equalTo(new ItemUsage(Collections.emptyList(), Collections.emptyList(), Collections.emptyList()))

equalTo(new ItemUsage(List.of(), List.of(), List.of()))

gmarouli · 2024-08-29T11:21:12Z

...core/src/test/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculatorTests.java

+            .build();
+        assertThat(
+            new LifecyclePolicyUsageCalculator(iner, state, List.of("mypolicy")).calculateUsage("mypolicy"),
+            equalTo(new ItemUsage(Collections.singleton("myindex"), Collections.emptyList(), Collections.emptyList()))


Suggested change

equalTo(new ItemUsage(Collections.singleton("myindex"), Collections.emptyList(), Collections.emptyList()))

equalTo(new ItemUsage(List.of("myindex"), List.of(), List.of()))

gmarouli · 2024-08-29T11:23:14Z

...ugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculator.java

+    public LifecyclePolicyUsageCalculator(
+        final IndexNameExpressionResolver indexNameExpressionResolver,
+        final ClusterState state,
+        List<String> names


Because we have a lot of elements involved here, I think it would be easier to specify what are these names. I think policyNames or policies is a bit more informative.

What do you think?

gmarouli · 2024-08-29T11:34:38Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java

    public static String findV2Template(Metadata metadata, String indexName, boolean isHidden) {
-        final List<Tuple<String, ComposableIndexTemplate>> candidates = findV2CandidateTemplates(metadata, indexName, isHidden);
+        return findV2Template(metadata, metadata.templatesV2().entrySet(), indexName, isHidden, false);
+    }


What if we create one more method called: findV2TemplateFromSortedList which will call findV2Template(metadata, metadata.templatesV2().entrySet(), indexName, isHidden, true) and we can document that if you want to retrieve the templates of multiple targets you can provide a sorted list of templates which will speed things up.

I think this will clarify a bit more which method to use and it will make explicit that the templates need to be ordered based on priority.

gmarouli · 2024-08-29T11:53:03Z

...ugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUsageCalculator.java

+        List<String> indices = new ArrayList<>();
+        for (IndexMetadata indexMetadata : state.metadata().indices().values()) {
+            if (policyName.equals(indexMetadata.getLifecyclePolicyName())) {
+                indices.add(indexMetadata.getIndex().getName());


Let's think about this. You only need to keep a cache of what is necessary. Right?

For example:
Composable templates:
Cache the resolved templates that match our policies and their index patterns: template -> policy, probably you can also store policy -> templates.

Data streams
Go over the data streams like you already do, find the template for each data stream and check the cache template -> policy. You do not need to keep track of nulls anymore, since you have collected the relevant templates, if the template is not in your cache then the data stream shouldn't be either. So you create the policy -> data stream.

Indices
I do not see an issue with going over the indices during initialisation because from what I saw the information is pre-calculated and then serialised. So, here you create policy -> indices.

Then retrieving the data is just picking them up from the cache. Thoughts?

Optimize retrieving ILM policies

5bf393c

nielsbauman added >enhancement :Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team v8.14.0 labels Apr 1, 2024

Update docs/changelog/106953.yaml

d160dc2

nielsbauman commented Apr 1, 2024

View reviewed changes

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyUtils.java Show resolved Hide resolved

Remove redundant computation

092cb0c

gmarouli self-requested a review April 3, 2024 10:18

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

nielsbauman added 2 commits May 4, 2024 09:54

Merge branch 'main' into ilm-optimization

16f3d83

Change approach

d30376e

nielsbauman commented May 4, 2024

View reviewed changes

parkertimmins reviewed May 9, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Show resolved Hide resolved

gmarouli reviewed May 17, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved

gmarouli reviewed May 17, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved

gmarouli reviewed May 17, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved

gmarouli reviewed May 17, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Outdated Show resolved Hide resolved

nielsbauman added 3 commits May 21, 2024 15:28

Merge branch 'main' into ilm-optimization

24acf2c

Refactor confusing loop

293ff9d

PR feedback

02fb7a2

nielsbauman requested a review from gmarouli May 21, 2024 14:41

parkertimmins reviewed May 21, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/metadata/MetadataIndexTemplateService.java Show resolved Hide resolved

Add missing break

e193a30

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

gmarouli requested changes Aug 29, 2024

View reviewed changes

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize usage calculation in ILM policies retrieval API #106953

Optimize usage calculation in ILM policies retrieval API #106953

nielsbauman commented Apr 1, 2024 •

edited

Loading

elasticsearchmachine commented Apr 1, 2024

elasticsearchmachine commented Apr 1, 2024

gmarouli commented Apr 3, 2024 •

edited by nielsbauman

Loading

nielsbauman May 4, 2024

nielsbauman May 4, 2024

nielsbauman May 4, 2024

nielsbauman May 4, 2024

gmarouli Aug 29, 2024

gmarouli May 17, 2024

nielsbauman May 21, 2024

gmarouli May 17, 2024

nielsbauman May 21, 2024 •

edited

Loading

gmarouli left a comment

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

gmarouli Aug 29, 2024

	equalTo(new ItemUsage(Collections.emptyList(), Collections.emptyList(), Collections.emptyList()))
	equalTo(new ItemUsage(List.of(), List.of(), List.of()))

	equalTo(new ItemUsage(Collections.singleton("myindex"), Collections.emptyList(), Collections.emptyList()))
	equalTo(new ItemUsage(List.of("myindex"), List.of(), List.of()))

Optimize usage calculation in ILM policies retrieval API #106953

Are you sure you want to change the base?

Optimize usage calculation in ILM policies retrieval API #106953

Conversation

nielsbauman commented Apr 1, 2024 • edited Loading

elasticsearchmachine commented Apr 1, 2024

elasticsearchmachine commented Apr 1, 2024

gmarouli commented Apr 3, 2024 • edited by nielsbauman Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsbauman May 21, 2024 • edited Loading

Choose a reason for hiding this comment

gmarouli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsbauman commented Apr 1, 2024 •

edited

Loading

gmarouli commented Apr 3, 2024 •

edited by nielsbauman

Loading

nielsbauman May 21, 2024 •

edited

Loading