Skip to content

[ML] Automatically map interval to fixed/calendar interval in datafeed aggs #51606

Closed
@droberts195

Description

@droberts195

Since #33727 went into 7.2 interval is deprecated in date histograms. New date histograms should use fixed_interval or calendar_interval instead. In 8.x it is likely that date histograms containing interval will be considered invalid.

ML datafeed configs are persisted and are likely to contain interval rather than fixed_interval or calendar_interval; they definitely will if they were first created in a version earlier than 7.2. Such datafeed configs will be invalid in 8.0, so prior to 8.0 we must do something about this, but there is no reason we cannot start sooner.

The best course of action would seem to be to silently rewrite datafeed configs that contain a date_histogram aggregation with an interval setting to have whichever of fixed_interval or calendar_interval preserves the current behaviour.

We can do this unconditionally providing the oldest node in the cluster is on version 7.2 or above.

Rollups already does this silent behaviour-preserving rewriting, although rollup configs are stored in cluster state which makes this considerably easier than for the ML configs that are stored in an index.

The code that does the translation for rollups is in:

PARSER = new ConstructingObjectParser<>(NAME, a -> {
DateHistogramInterval oldInterval = (DateHistogramInterval) a[1];
DateHistogramInterval calendarInterval = (DateHistogramInterval) a[2];
DateHistogramInterval fixedInterval = (DateHistogramInterval) a[3];
if (oldInterval != null) {
if (calendarInterval != null || fixedInterval != null) {
throw new IllegalArgumentException("Cannot use [interval] with [fixed_interval] or [calendar_interval] " +
"configuration options.");
}
return fromUnknownTimeUnit((String) a[0], oldInterval, (DateHistogramInterval) a[4], (String) a[5]);
} else if (calendarInterval != null && fixedInterval == null) {
return new CalendarInterval((String) a[0], calendarInterval, (DateHistogramInterval) a[4], (String) a[5]);
} else if (calendarInterval == null && fixedInterval != null) {
return new FixedInterval((String) a[0], fixedInterval, (DateHistogramInterval) a[4], (String) a[5]);
} else if (calendarInterval != null && fixedInterval != null) {
throw new IllegalArgumentException("Cannot set both [fixed_interval] and [calendar_interval] at the same time");
} else {
throw new IllegalArgumentException("An interval is required. Use [fixed_interval] or [calendar_interval].");
}
});

and:

private static DateHistogramGroupConfig fromUnknownTimeUnit(String field, DateHistogramInterval interval,
DateHistogramInterval delay, String timeZone) {
if (DateHistogramAggregationBuilder.DATE_FIELD_UNITS.get(interval.toString()) != null) {
return new CalendarInterval(field, interval, delay, timeZone);
} else {
return new FixedInterval(field, interval, delay, timeZone);
}
}

That should be considered as the spec for how to do a silent behaviour-preserving rewrite.

The extra complexity in ML is deciding where to do the migration for datafeed configs that wouldn't otherwise be rewritten. If we do it in the datafeed config parser then that ensures that newly submitted configs and datafeed configs that are updated will get translated. Then we would just need to find an appropriate place to do it for existing datafeed configs - maybe on master node election providing the oldest node in the cluster is version 7.2 or above?

There are also some currently muted tests that should be reenabled when the work is done:

  1. @AwaitsFix(bugUrl = "Needs ML to look at and fix. Unclear how this should be handled, interval is not an optional param")
  2. @AwaitsFix(bugUrl = "Tests need to be updated to use calendar/fixed interval explicitly")

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions