Skip to content

Commit c65f828

Browse files
authored
[7.x] Histogram field type support for ValueCount and Avg aggregations (#56099)
Backports #55933 to 7.x Implements value_count and avg aggregations over Histogram fields as discussed in #53285 - value_count returns the sum of all counts array of the histograms - avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
1 parent 0860d1d commit c65f828

File tree

19 files changed

+715
-49
lines changed

19 files changed

+715
-49
lines changed

docs/reference/aggregations/metrics/avg-aggregation.asciidoc

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,3 +126,57 @@ POST /exams/_search?size=0
126126
// TEST[setup:exams]
127127

128128
<1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
129+
130+
131+
[[search-aggregations-metrics-avg-aggregation-histogram-fields]]
132+
==== Histogram fields
133+
When average is computed on <<histogram,histogram fields>>, the result of the aggregation is the weighted average
134+
of all elements in the `values` array taking into consideration the number in the same position in the `counts` array.
135+
136+
For example, for the following index that stores pre-aggregated histograms with latency metrics for different networks:
137+
138+
[source,console]
139+
--------------------------------------------------
140+
PUT metrics_index/_doc/1
141+
{
142+
"network.name" : "net-1",
143+
"latency_histo" : {
144+
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1>
145+
"counts" : [3, 7, 23, 12, 6] <2>
146+
}
147+
}
148+
149+
PUT metrics_index/_doc/2
150+
{
151+
"network.name" : "net-2",
152+
"latency_histo" : {
153+
"values" : [0.1, 0.2, 0.3, 0.4, 0.5], <1>
154+
"counts" : [8, 17, 8, 7, 6] <2>
155+
}
156+
}
157+
158+
POST /metrics_index/_search?size=0
159+
{
160+
"aggs" : {
161+
"avg_latency" :
162+
{ "avg" : { "field" : "latency_histo" }
163+
}
164+
}
165+
}
166+
--------------------------------------------------
167+
168+
For each histogram field the `avg` aggregation adds each number in the `values` array <1> multiplied by its associated count
169+
in the `counts` array <2>. Eventually, it will compute the average over those values for all histograms and return the following result:
170+
171+
[source,console-result]
172+
--------------------------------------------------
173+
{
174+
...
175+
"aggregations" : {
176+
"avg_latency" : {
177+
"value" : 0.29690721649
178+
}
179+
}
180+
}
181+
--------------------------------------------------
182+
// TESTRESPONSE[skip:test not setup]

docs/reference/aggregations/metrics/sum-aggregation.asciidoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -163,10 +163,10 @@ POST /sales/_search?size=0
163163
[[search-aggregations-metrics-sum-aggregation-histogram-fields]]
164164
==== Histogram fields
165165

166-
When the sums are computed on <<histogram,histogram fields>>, the result of the aggregation is the sum of all elements in the `values`
166+
When sum is computed on <<histogram,histogram fields>>, the result of the aggregation is the sum of all elements in the `values`
167167
array multiplied by the number in the same position in the `counts` array.
168168

169-
For example, if we have the following index that stores pre-aggregated histograms with latency metrics for different networks:
169+
For example, for the following index that stores pre-aggregated histograms with latency metrics for different networks:
170170

171171
[source,console]
172172
--------------------------------------------------
@@ -196,7 +196,7 @@ POST /metrics_index/_search?size=0
196196
}
197197
--------------------------------------------------
198198

199-
For each histogram field the sum aggregation will multiply each number in the `values` array <1> multiplied with its associated count
199+
For each histogram field the `sum` aggregation will multiply each number in the `values` array <1> multiplied by its associated count
200200
in the `counts` array <2>. Eventually, it will add all values for all histograms and return the following result:
201201

202202
[source,console-result]

docs/reference/aggregations/metrics/valuecount-aggregation.asciidoc

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ These values can be extracted either from specific fields in the documents, or b
66
this aggregator will be used in conjunction with other single-value aggregations. For example, when computing the `avg`
77
one might be interested in the number of values the average is computed over.
88

9+
`value_count` does not de-duplicate values, so even if a field has duplicates (or a script generates multiple
10+
identical values for a single document), each value will be counted individually.
11+
912
[source,console]
1013
--------------------------------------------------
1114
POST /sales/_search?size=0
@@ -77,3 +80,60 @@ POST /sales/_search?size=0
7780
}
7881
--------------------------------------------------
7982
// TEST[setup:sales,stored_example_script]
83+
84+
NOTE:: Because `value_count` is designed to work with any field it internally treats all values as simple bytes.
85+
Due to this implementation, if `_value` script variable is used to fetch a value instead of accessing the field
86+
directly (e.g. a "value script"), the field value will be returned as a string instead of it's native format.
87+
88+
[[search-aggregations-metrics-valuecount-aggregation-histogram-fields]]
89+
==== Histogram fields
90+
When the `value_count` aggregation is computed on <<histogram,histogram fields>>, the result of the aggregation is the sum of all numbers
91+
in the `counts` array of the histogram.
92+
93+
For example, for the following index that stores pre-aggregated histograms with latency metrics for different networks:
94+
95+
[source,console]
96+
--------------------------------------------------
97+
PUT metrics_index/_doc/1
98+
{
99+
"network.name" : "net-1",
100+
"latency_histo" : {
101+
"values" : [0.1, 0.2, 0.3, 0.4, 0.5],
102+
"counts" : [3, 7, 23, 12, 6] <1>
103+
}
104+
}
105+
106+
PUT metrics_index/_doc/2
107+
{
108+
"network.name" : "net-2",
109+
"latency_histo" : {
110+
"values" : [0.1, 0.2, 0.3, 0.4, 0.5],
111+
"counts" : [8, 17, 8, 7, 6] <1>
112+
}
113+
}
114+
115+
POST /metrics_index/_search?size=0
116+
{
117+
"aggs" : {
118+
"total_requests" : {
119+
"value_count" : { "field" : "latency_histo" }
120+
}
121+
}
122+
}
123+
--------------------------------------------------
124+
125+
For each histogram field the `value_count` aggregation will sum all numbers in the `counts` array <1>.
126+
Eventually, it will add all values for all histograms and return the following result:
127+
128+
[source,console-result]
129+
--------------------------------------------------
130+
{
131+
...
132+
"aggregations" : {
133+
"total_requests" : {
134+
"value" : 97
135+
}
136+
}
137+
}
138+
--------------------------------------------------
139+
// TESTRESPONSE[skip:test not setup]

docs/reference/mapping/types/histogram.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ Because the data is not indexed, you only can use `histogram` fields for the
3636
following aggregations and queries:
3737

3838
* <<search-aggregations-metrics-sum-aggregation-histogram-fields,sum>> aggregation
39+
* <<search-aggregations-metrics-valuecount-aggregation-histogram-fields,value_count>> aggregation
40+
* <<search-aggregations-metrics-avg-aggregation-histogram-fields,avg>> aggregation
3941
* <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation
4042
* <<search-aggregations-metrics-percentile-rank-aggregation,percentile ranks>> aggregation
4143
* <<search-aggregations-metrics-boxplot-aggregation,boxplot>> aggregation

server/src/main/java/org/elasticsearch/search/aggregations/metrics/InternalValueCount.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
public class InternalValueCount extends InternalNumericMetricsAggregation.SingleValue implements ValueCount {
3535
private final long value;
3636

37-
InternalValueCount(String name, long value, Map<String, Object> metadata) {
37+
public InternalValueCount(String name, long value, Map<String, Object> metadata) {
3838
super(name, metadata);
3939
this.value = value;
4040
}

x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/AnalyticsPlugin.java

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,6 @@
5252
import org.elasticsearch.xpack.core.analytics.action.AnalyticsStatsAction;
5353

5454
import java.util.ArrayList;
55-
import java.util.Arrays;
5655
import java.util.Collection;
5756
import java.util.Collections;
5857
import java.util.List;
@@ -86,7 +85,7 @@ public List<PipelineAggregationSpec> getPipelineAggregations() {
8685

8786
@Override
8887
public List<AggregationSpec> getAggregations() {
89-
return Arrays.asList(
88+
return org.elasticsearch.common.collect.List.of(
9089
new AggregationSpec(
9190
StringStatsAggregationBuilder.NAME,
9291
StringStatsAggregationBuilder::new,
@@ -143,10 +142,12 @@ public Map<String, Mapper.TypeParser> getMappers() {
143142

144143
@Override
145144
public List<Consumer<ValuesSourceRegistry.Builder>> getAggregationExtentions() {
146-
return Arrays.asList(
145+
return org.elasticsearch.common.collect.List.of(
147146
AnalyticsAggregatorFactory::registerPercentilesAggregator,
148147
AnalyticsAggregatorFactory::registerPercentileRanksAggregator,
149-
AnalyticsAggregatorFactory::registerHistoBackedSumAggregator
148+
AnalyticsAggregatorFactory::registerHistoBackedSumAggregator,
149+
AnalyticsAggregatorFactory::registerHistoBackedValueCountAggregator,
150+
AnalyticsAggregatorFactory::registerHistoBackedAverageAggregator
150151
);
151152
}
152153

@@ -160,7 +161,7 @@ public Collection<Object> createComponents(Client client, ClusterService cluster
160161

161162
@Override
162163
public List<NamedWriteableRegistry.Entry> getNamedWriteables() {
163-
return Arrays.asList(
164+
return org.elasticsearch.common.collect.List.of(
164165
new NamedWriteableRegistry.Entry(TTestState.class, PairedTTestState.NAME, PairedTTestState::new),
165166
new NamedWriteableRegistry.Entry(TTestState.class, UnpairedTTestState.NAME, UnpairedTTestState::new)
166167
);

x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/aggregations/metrics/AnalyticsAggregatorFactory.java

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,21 @@
33
* or more contributor license agreements. Licensed under the Elastic License;
44
* you may not use this file except in compliance with the Elastic License.
55
*/
6-
76
package org.elasticsearch.xpack.analytics.aggregations.metrics;
87

8+
import org.elasticsearch.search.aggregations.metrics.AvgAggregationBuilder;
99
import org.elasticsearch.search.aggregations.metrics.MetricAggregatorSupplier;
1010
import org.elasticsearch.search.aggregations.metrics.PercentileRanksAggregationBuilder;
1111
import org.elasticsearch.search.aggregations.metrics.PercentilesAggregationBuilder;
1212
import org.elasticsearch.search.aggregations.metrics.PercentilesAggregatorSupplier;
1313
import org.elasticsearch.search.aggregations.metrics.PercentilesConfig;
1414
import org.elasticsearch.search.aggregations.metrics.PercentilesMethod;
1515
import org.elasticsearch.search.aggregations.metrics.SumAggregationBuilder;
16+
import org.elasticsearch.search.aggregations.metrics.ValueCountAggregationBuilder;
17+
import org.elasticsearch.search.aggregations.metrics.ValueCountAggregatorSupplier;
1618
import org.elasticsearch.search.aggregations.support.ValuesSourceRegistry;
1719
import org.elasticsearch.xpack.analytics.aggregations.support.AnalyticsValuesSourceType;
20+
import org.elasticsearch.xpack.analytics.aggregations.support.HistogramValuesSource;
1821

1922
public class AnalyticsAggregatorFactory {
2023

@@ -65,6 +68,24 @@ public static void registerPercentileRanksAggregator(ValuesSourceRegistry.Builde
6568
public static void registerHistoBackedSumAggregator(ValuesSourceRegistry.Builder builder) {
6669
builder.register(SumAggregationBuilder.NAME,
6770
AnalyticsValuesSourceType.HISTOGRAM,
68-
(MetricAggregatorSupplier) HistoBackedSumAggregator::new);
71+
(MetricAggregatorSupplier) (name, valuesSource, format, context, parent, metadata) ->
72+
new HistoBackedSumAggregator(name, (HistogramValuesSource.Histogram) valuesSource, format, context, parent, metadata)
73+
);
74+
}
75+
76+
public static void registerHistoBackedValueCountAggregator(ValuesSourceRegistry.Builder builder) {
77+
builder.register(ValueCountAggregationBuilder.NAME,
78+
AnalyticsValuesSourceType.HISTOGRAM,
79+
(ValueCountAggregatorSupplier) (name, valuesSource, context, parent, metadata) ->
80+
new HistoBackedValueCountAggregator(name, (HistogramValuesSource.Histogram) valuesSource, context, parent, metadata)
81+
);
82+
}
83+
84+
public static void registerHistoBackedAverageAggregator(ValuesSourceRegistry.Builder builder) {
85+
builder.register(AvgAggregationBuilder.NAME,
86+
AnalyticsValuesSourceType.HISTOGRAM,
87+
(MetricAggregatorSupplier) (name, valuesSource, format, context, parent, metadata) ->
88+
new HistoBackedAvgAggregator(name, (HistogramValuesSource.Histogram) valuesSource, format, context, parent, metadata)
89+
);
6990
}
7091
}
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License;
4+
* you may not use this file except in compliance with the Elastic License.
5+
*/
6+
package org.elasticsearch.xpack.analytics.aggregations.metrics;
7+
8+
import org.apache.lucene.index.LeafReaderContext;
9+
import org.apache.lucene.search.ScoreMode;
10+
import org.elasticsearch.common.lease.Releasables;
11+
import org.elasticsearch.common.util.BigArrays;
12+
import org.elasticsearch.common.util.DoubleArray;
13+
import org.elasticsearch.common.util.LongArray;
14+
import org.elasticsearch.index.fielddata.HistogramValue;
15+
import org.elasticsearch.index.fielddata.HistogramValues;
16+
import org.elasticsearch.search.DocValueFormat;
17+
import org.elasticsearch.search.aggregations.Aggregator;
18+
import org.elasticsearch.search.aggregations.InternalAggregation;
19+
import org.elasticsearch.search.aggregations.LeafBucketCollector;
20+
import org.elasticsearch.search.aggregations.LeafBucketCollectorBase;
21+
import org.elasticsearch.search.aggregations.metrics.CompensatedSum;
22+
import org.elasticsearch.search.aggregations.metrics.InternalAvg;
23+
import org.elasticsearch.search.aggregations.metrics.NumericMetricsAggregator;
24+
import org.elasticsearch.search.internal.SearchContext;
25+
import org.elasticsearch.xpack.analytics.aggregations.support.HistogramValuesSource;
26+
27+
import java.io.IOException;
28+
import java.util.Map;
29+
30+
/**
31+
* Average aggregator operating over histogram datatypes {@link HistogramValuesSource}
32+
* The aggregation computes weighted average by taking counts into consideration for each value
33+
*/
34+
class HistoBackedAvgAggregator extends NumericMetricsAggregator.SingleValue {
35+
36+
private final HistogramValuesSource.Histogram valuesSource;
37+
38+
LongArray counts;
39+
DoubleArray sums;
40+
DoubleArray compensations;
41+
DocValueFormat format;
42+
43+
HistoBackedAvgAggregator(String name, HistogramValuesSource.Histogram valuesSource, DocValueFormat formatter, SearchContext context,
44+
Aggregator parent, Map<String, Object> metadata) throws IOException {
45+
super(name, context, parent, metadata);
46+
this.valuesSource = valuesSource;
47+
this.format = formatter;
48+
if (valuesSource != null) {
49+
final BigArrays bigArrays = context.bigArrays();
50+
counts = bigArrays.newLongArray(1, true);
51+
sums = bigArrays.newDoubleArray(1, true);
52+
compensations = bigArrays.newDoubleArray(1, true);
53+
}
54+
}
55+
56+
@Override
57+
public ScoreMode scoreMode() {
58+
return valuesSource != null && valuesSource.needsScores() ? ScoreMode.COMPLETE : ScoreMode.COMPLETE_NO_SCORES;
59+
}
60+
61+
@Override
62+
public LeafBucketCollector getLeafCollector(LeafReaderContext ctx,
63+
final LeafBucketCollector sub) throws IOException {
64+
if (valuesSource == null) {
65+
return LeafBucketCollector.NO_OP_COLLECTOR;
66+
}
67+
final BigArrays bigArrays = context.bigArrays();
68+
final HistogramValues values = valuesSource.getHistogramValues(ctx);
69+
final CompensatedSum kahanSummation = new CompensatedSum(0, 0);
70+
71+
return new LeafBucketCollectorBase(sub, values) {
72+
@Override
73+
public void collect(int doc, long bucket) throws IOException {
74+
counts = bigArrays.grow(counts, bucket + 1);
75+
sums = bigArrays.grow(sums, bucket + 1);
76+
compensations = bigArrays.grow(compensations, bucket + 1);
77+
78+
if (values.advanceExact(doc)) {
79+
final HistogramValue sketch = values.histogram();
80+
81+
// Compute the sum of double values with Kahan summation algorithm which is more accurate than naive summation
82+
final double sum = sums.get(bucket);
83+
final double compensation = compensations.get(bucket);
84+
kahanSummation.reset(sum, compensation);
85+
while (sketch.next()) {
86+
double d = sketch.value() * sketch.count();
87+
kahanSummation.add(d);
88+
counts.increment(bucket, sketch.count());
89+
}
90+
91+
sums.set(bucket, kahanSummation.value());
92+
compensations.set(bucket, kahanSummation.delta());
93+
}
94+
}
95+
};
96+
}
97+
98+
@Override
99+
public double metric(long owningBucketOrd) {
100+
if (valuesSource == null || owningBucketOrd >= sums.size()) {
101+
return Double.NaN;
102+
}
103+
return sums.get(owningBucketOrd) / counts.get(owningBucketOrd);
104+
}
105+
106+
@Override
107+
public InternalAggregation buildAggregation(long bucket) {
108+
if (valuesSource == null || bucket >= sums.size()) {
109+
return buildEmptyAggregation();
110+
}
111+
return new InternalAvg(name, sums.get(bucket), counts.get(bucket), format, metadata());
112+
}
113+
114+
@Override
115+
public InternalAggregation buildEmptyAggregation() {
116+
return new InternalAvg(name, 0.0, 0L, format, metadata());
117+
}
118+
119+
@Override
120+
public void doClose() {
121+
Releasables.close(counts, sums, compensations);
122+
}
123+
124+
}

0 commit comments

Comments
 (0)