Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize usage calculation in ILM policies retrieval API #106953

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/106953.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 106953
summary: Optimize usage calculation in ILM policies retrieval API
area: ILM+SLM
type: enhancement
issues:
- 105773
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
import java.time.Instant;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
Expand Down Expand Up @@ -1050,29 +1051,25 @@ static Set<String> dataStreamsExclusivelyUsingTemplates(final ClusterState state
.reduce(Sets::union)
.orElse(Set.of());
Comment on lines 1051 to 1052
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I know this is not your code but since it's in my visual I wanted to add a small note.

I think this code is a bit more complex than it has to be.

.map(Set::copyOf)
.reduce(Sets::union)
.orElse(Set.of())

Effectively we want to add all patterns into a set, right? I think the following code is a bit more readable.

.flatMap(List::stream)
.collect(Collectors.toSet())

What do you think?


// Determine all the composable templates that are not one of the provided templates.
var otherTemplates = state.metadata()
.templatesV2()
.entrySet()
.stream()
.filter(
entry -> templateNames.contains(entry.getKey()) == false
&& isGlobalAndHasIndexHiddenSetting(metadata, entry.getValue(), entry.getKey()) == false
)
// Sort here so we can `exitOnFirstMatch` in `findV2Template`.
.sorted(Comparator.comparing(entry -> entry.getValue().priorityOrZero(), Comparator.reverseOrder()))
.toList();

return metadata.dataStreams()
.values()
.stream()
// Limit to checking data streams that match any of the templates' index patterns
.filter(ds -> namePatterns.stream().anyMatch(pattern -> Regex.simpleMatch(pattern, ds.getName())))
.filter(ds -> {
// Retrieve the templates that match the data stream name ordered by priority
List<Tuple<String, ComposableIndexTemplate>> candidates = findV2CandidateTemplates(metadata, ds.getName(), ds.isHidden());
if (candidates.isEmpty()) {
throw new IllegalStateException("Data stream " + ds.getName() + " did not match any composable index templates.");
}

// Limit data streams that can ONLY use any of the specified templates, we do this by filtering
// the matching templates that are others than the ones requested and could be a valid template to use.
return candidates.stream()
.filter(
template -> templateNames.contains(template.v1()) == false
&& isGlobalAndHasIndexHiddenSetting(metadata, template.v2(), template.v1()) == false
)
.map(Tuple::v1)
.toList()
.isEmpty();
})
.filter(ds -> findV2Template(state.metadata(), otherTemplates, ds.getName(), ds.isHidden(), true) == null)
.map(DataStream::getName)
.collect(Collectors.toSet());
}
Expand Down Expand Up @@ -1268,7 +1265,27 @@ public static List<IndexTemplateMetadata> findV1Templates(Metadata metadata, Str
*/
@Nullable
public static String findV2Template(Metadata metadata, String indexName, boolean isHidden) {
final List<Tuple<String, ComposableIndexTemplate>> candidates = findV2CandidateTemplates(metadata, indexName, isHidden);
return findV2Template(metadata, metadata.templatesV2().entrySet(), indexName, isHidden, false);
}
Comment on lines 1267 to +1269
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we create one more method called: findV2TemplateFromSortedList which will call findV2Template(metadata, metadata.templatesV2().entrySet(), indexName, isHidden, true) and we can document that if you want to retrieve the templates of multiple targets you can provide a sorted list of templates which will speed things up.

I think this will clarify a bit more which method to use and it will make explicit that the templates need to be ordered based on priority.


/**
* Return the name (id) of the highest matching index template, out of the provided templates, for the given index name. In
* the event that no templates are matched, {@code null} is returned.
*/
@Nullable
public static String findV2Template(
Metadata metadata,
Collection<Map.Entry<String, ComposableIndexTemplate>> templates,
String indexName,
boolean isHidden,
boolean exitOnFirstMatch
) {
final List<Tuple<String, ComposableIndexTemplate>> candidates = findV2CandidateTemplates(
templates,
indexName,
isHidden,
exitOnFirstMatch
);
Comment on lines +1276 to +1288
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like the idea of more explicit method names like I suggested in the previous comment. We could change the visibility of this method to private or package private depending on the testing requirements. This way this method won't be misused by setting exitOnFirstMatch with an unordered template list.

What do you think?

if (candidates.isEmpty()) {
return null;
}
Expand Down Expand Up @@ -1296,25 +1313,30 @@ public static String findV2Template(Metadata metadata, String indexName, boolean
/**
* Return an ordered list of the name (id) and composable index templates that would apply to an index. The first
* one is the winner template that is applied to this index. In the event that no templates are matched,
* an empty list is returned.
* an empty list is returned. If <code>exitOnFirstMatch</code> is true, we return immediately after finding a match.
*/
Comment on lines 1314 to 1317
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to elaborate a bit more on how to use this method. What are the trade-offs of exitOnFirstMatch. How it should be used etc.

static List<Tuple<String, ComposableIndexTemplate>> findV2CandidateTemplates(Metadata metadata, String indexName, boolean isHidden) {
static List<Tuple<String, ComposableIndexTemplate>> findV2CandidateTemplates(
Collection<Map.Entry<String, ComposableIndexTemplate>> templates,
String indexName,
boolean isHidden,
boolean exitOnFirstMatch
) {
final String resolvedIndexName = IndexNameExpressionResolver.DateMathExpressionResolver.resolveExpression(indexName);
final Predicate<String> patternMatchPredicate = pattern -> Regex.simpleMatch(pattern, resolvedIndexName);
final List<Tuple<String, ComposableIndexTemplate>> candidates = new ArrayList<>();
for (Map.Entry<String, ComposableIndexTemplate> entry : metadata.templatesV2().entrySet()) {
for (Map.Entry<String, ComposableIndexTemplate> entry : templates) {
final String name = entry.getKey();
final ComposableIndexTemplate template = entry.getValue();
if (isHidden == false) {
final boolean matched = template.indexPatterns().stream().anyMatch(patternMatchPredicate);
if (matched) {
candidates.add(Tuple.tuple(name, template));
if (isHidden) {
final boolean hasMatchAllTemplate = template.indexPatterns().stream().anyMatch(Regex::isMatchAllPattern);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: hasMatchAllTemplate I think this reads better as isMatchAllTemplate because it talks about the template itself. Otherwise it should be hasMatchAllPattern.

if (hasMatchAllTemplate) {
continue;
}
} else {
final boolean isNotMatchAllTemplate = template.indexPatterns().stream().noneMatch(Regex::isMatchAllPattern);
if (isNotMatchAllTemplate) {
if (template.indexPatterns().stream().anyMatch(patternMatchPredicate)) {
candidates.add(Tuple.tuple(name, template));
}
for (String indexPattern : template.indexPatterns()) {
if (Regex.simpleMatch(indexPattern, resolvedIndexName)) {
candidates.add(Tuple.tuple(name, template));
if (exitOnFirstMatch) {
return candidates;
}
nielsbauman marked this conversation as resolved.
Show resolved Hide resolved
}
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

package org.elasticsearch.xpack.core.ilm;

import org.apache.lucene.util.CollectionUtil;
import org.elasticsearch.action.support.IndicesOptions;
import org.elasticsearch.cluster.ClusterState;
import org.elasticsearch.cluster.metadata.ComposableIndexTemplate;
import org.elasticsearch.cluster.metadata.IndexMetadata;
import org.elasticsearch.cluster.metadata.IndexNameExpressionResolver;
import org.elasticsearch.cluster.metadata.ItemUsage;
import org.elasticsearch.cluster.metadata.MetadataIndexTemplateService;
import org.elasticsearch.common.regex.Regex;
import org.elasticsearch.common.settings.Settings;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
* A class that can be used to calculate the usages of ILM policies. This class computes some information on initialization, which will
* use a bit more memory but speeds up the usage calculation significantly.
*/
public class LifecyclePolicyUsageCalculator {

private final ClusterState state;
/** Whether {@link #calculateUsage} will be called multiple times or not. */
private final boolean willIterate;
/** A map from policy name to list of data streams that use that policy. */
private final Map<String, List<String>> policyToDataStream;
/** A map from composable template name to the policy name it uses (or null) */
private final Map<String, String> templateToPolicy;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templateToPolicy map is basically only there to save some MetadataIndexTemplateService.resolveSettings calls (as we have to loop over all templates in calculateUsage anyway), but resolving the settings seemed to be relatively expensive, so templateToPolicy basically serves as a cache for that.


public LifecyclePolicyUsageCalculator(
final IndexNameExpressionResolver indexNameExpressionResolver,
final ClusterState state,
List<String> names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have a lot of elements involved here, I think it would be easier to specify what are these names. I think policyNames or policies is a bit more informative.

What do you think?

) {
this.state = state;
this.willIterate = names.size() > 1 || Regex.isSimpleMatchPattern(names.get(0));

var allDataStreams = indexNameExpressionResolver.dataStreamNames(state, IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED_HIDDEN);
// Sort all templates by descending priority. That way, findV2Template can exit on the first found template.
var indexTemplates = new ArrayList<>(state.metadata().templatesV2().entrySet());
CollectionUtil.timSort(indexTemplates, Comparator.comparing(entry -> entry.getValue().priorityOrZero(), Comparator.reverseOrder()));

// Build the maps that will be used for the usage calculation later on.
IndexLifecycleMetadata metadata = state.metadata().custom(IndexLifecycleMetadata.TYPE);
policyToDataStream = new HashMap<>(Regex.isSimpleMatchPattern(names.get(0)) ? metadata.getPolicyMetadatas().size() : names.size());
templateToPolicy = new HashMap<>(indexTemplates.size());
for (String dataStream : allDataStreams) {
String indexTemplate = MetadataIndexTemplateService.findV2Template(state.metadata(), indexTemplates, dataStream, false, true);
if (indexTemplate == null) {
continue;
}
Settings settings = MetadataIndexTemplateService.resolveSettings(state.metadata(), indexTemplate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before doing this, shouldn't we check if the template is already in templateToPolicy, then we do not have to resolve the settings again?

var policyName = LifecycleSettings.LIFECYCLE_NAME_SETTING.get(settings);
if (names.stream().noneMatch(name -> Regex.simpleMatch(name, policyName))) {
// If a template's policy doesn't match any of the supplied names, we can skip it later on.
templateToPolicy.put(indexTemplate, null);
continue;
}
templateToPolicy.put(indexTemplate, policyName);
policyToDataStream.computeIfAbsent(policyName, k -> new ArrayList<>()).add(dataStream);
}
}

/**
* Calculate the indices, data streams, and composable templates that use the given policy.
*/
public ItemUsage calculateUsage(String policyName) {
List<String> indices = new ArrayList<>();
for (IndexMetadata indexMetadata : state.metadata().indices().values()) {
if (policyName.equals(indexMetadata.getLifecyclePolicyName())) {
indices.add(indexMetadata.getIndex().getName());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could pre-compute a map of policy to indices as well, but I'm not sure the memory vs. speed trade-off is worth it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, since we have to build this list for every policy anyway, pre-computing these lists and putting them in a map shouldn't be too much additional memory overhead. I'll wait till someone has done a first review before making more changes (in case my whole approach is off).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same goes for the composable templates of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about this. You only need to keep a cache of what is necessary. Right?

For example:
Composable templates:
Cache the resolved templates that match our policies and their index patterns: template -> policy, probably you can also store policy -> templates.

Data streams
Go over the data streams like you already do, find the template for each data stream and check the cache template -> policy. You do not need to keep track of nulls anymore, since you have collected the relevant templates, if the template is not in your cache then the data stream shouldn't be either. So you create the policy -> data stream.

Indices
I do not see an issue with going over the indices during initialisation because from what I saw the information is pre-calculated and then serialised. So, here you create policy -> indices.

Then retrieving the data is just picking them up from the cache. Thoughts?

}
}

List<String> composableTemplates = new ArrayList<>();
for (Map.Entry<String, ComposableIndexTemplate> entry : state.metadata().templatesV2().entrySet()) {
var foundPolicy = templateToPolicy.get(entry.getKey());
// Extra `containsKey` check to account for templates not using any policy.
if (foundPolicy == null && templateToPolicy.containsKey(entry.getKey()) == false) {
Settings settings = MetadataIndexTemplateService.resolveSettings(entry.getValue(), state.metadata().componentTemplates());
foundPolicy = LifecycleSettings.LIFECYCLE_NAME_SETTING.get(settings);
// If this method will only be called once, we don't need to keep building the map.
if (willIterate) {
templateToPolicy.put(entry.getKey(), foundPolicy);
}
}
if (policyName.equals(foundPolicy)) {
composableTemplates.add(entry.getKey());
}
}

return new ItemUsage(indices, policyToDataStream.getOrDefault(policyName, List.of()), composableTemplates);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,16 @@
package org.elasticsearch.xpack.core.ilm;

import org.elasticsearch.ElasticsearchParseException;
import org.elasticsearch.action.support.IndicesOptions;
import org.elasticsearch.cluster.ClusterState;
import org.elasticsearch.cluster.metadata.IndexNameExpressionResolver;
import org.elasticsearch.cluster.metadata.ItemUsage;
import org.elasticsearch.cluster.metadata.MetadataIndexTemplateService;
import org.elasticsearch.common.bytes.BytesArray;
import org.elasticsearch.common.compress.NotXContentException;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentHelper;
import org.elasticsearch.xcontent.NamedXContentRegistry;
import org.elasticsearch.xcontent.XContentParser;
import org.elasticsearch.xcontent.XContentParserConfiguration;
import org.elasticsearch.xcontent.XContentType;
import org.elasticsearch.xpack.core.template.resources.TemplateResources;

import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

/**
* A utility class used for index lifecycle policies
Expand Down Expand Up @@ -91,44 +83,4 @@ private static void validate(String source) {
throw new ElasticsearchParseException("invalid policy", e);
}
}

/**
* Given a cluster state and ILM policy, calculate the {@link ItemUsage} of
* the policy (what indices, data streams, and templates use the policy)
*/
public static ItemUsage calculateUsage(
final IndexNameExpressionResolver indexNameExpressionResolver,
final ClusterState state,
final String policyName
) {
final List<String> indices = state.metadata()
.indices()
.values()
.stream()
.filter(indexMetadata -> policyName.equals(indexMetadata.getLifecyclePolicyName()))
.map(indexMetadata -> indexMetadata.getIndex().getName())
.collect(Collectors.toList());

final List<String> allDataStreams = indexNameExpressionResolver.dataStreamNames(
state,
IndicesOptions.LENIENT_EXPAND_OPEN_CLOSED_HIDDEN
);

final List<String> dataStreams = allDataStreams.stream().filter(dsName -> {
String indexTemplate = MetadataIndexTemplateService.findV2Template(state.metadata(), dsName, false);
nielsbauman marked this conversation as resolved.
Show resolved Hide resolved
if (indexTemplate != null) {
Settings settings = MetadataIndexTemplateService.resolveSettings(state.metadata(), indexTemplate);
return policyName.equals(LifecycleSettings.LIFECYCLE_NAME_SETTING.get(settings));
} else {
return false;
}
}).collect(Collectors.toList());

final List<String> composableTemplates = state.metadata().templatesV2().keySet().stream().filter(templateName -> {
Settings settings = MetadataIndexTemplateService.resolveSettings(state.metadata(), templateName);
return policyName.equals(LifecycleSettings.LIFECYCLE_NAME_SETTING.get(settings));
}).collect(Collectors.toList());

return new ItemUsage(indices, dataStreams, composableTemplates);
}
}
Loading