-
Notifications
You must be signed in to change notification settings - Fork 25.3k
ValuesSourceRegistry Prototype #48758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValuesSourceRegistry Prototype #48758
Conversation
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
*/ | ||
package org.elasticsearch.search.aggregations.support; | ||
|
||
public interface AggregatorSupplier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love having a no-op interface here, but it gives us a bit of type safety over just passing in any old object. The problem here is that each aggregation family has a unique signature for how it constructs aggregators, so we can't have a common interface that means something. This gives us a type that all those aggregation specific suppliers can inherit from, and we can be reasonably sure that it's a safe cast, since we trust that people aren't cross-registering aggregators. And, if there were a cross-registered aggregator, it wouldn't work anyway, so failing if we detect a bad cast is ok too.
That said, this feels really clunky and I'm open to better options. I spent a lot of time thinking about this problem, and this was the best solution I came up with.
import java.util.List; | ||
import java.util.Map; | ||
|
||
public interface HistogramAggregatorSupplier extends AggregatorSupplier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each aggregation family will define one of these which wraps the constructor signature for the aggregators that go with that family.
* absence of any one field. The type of the script is given by the valueType field, and defaults to bytes if not specified. | ||
*/ | ||
// TODO: Not pluggable, should always be valueType if specified, BYTES if not. | ||
// TODO: Probably should validate that the resulting type is valid for this agg. That needs to be plugable. | ||
ValuesSourceType valuesSourceType = valueType != null ? valueType.getValuesSourceType() : ValuesSourceType.ANY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that by the time we get here, valueType
is hopelessly muddled. It could be user-specified, or it could be a default value from the aggregation builder, or it could be null. We have no way of knowing the source at this point. I have a plan to clean that up, but in the interest of small, manageable PRs, I want to do that in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* pattern. In this case, we're going to end up using the EMPTY variant of the ValuesSource, and possibly applying a user | ||
* specified missing value. | ||
*/ | ||
// TODO: This should be pluggable too; Effectively that will replace the missingAny() case from toValuesSource() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this TODO will happen in another PR centered around fixing ValueType
for aggregations using the registry to resolve aggregators. | ||
*/ | ||
public enum ValuesSourceRegistry { | ||
INSTANCE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the funky mutable enum singleton pattern, I just needed something that wasn't going to get in the way for prototyping. Happy to use a different implementation before going to production.
This is intended to illustrate the core API for the new registry pattern. We'll probably need to add one or two more registered components, for the script and unmapped branches in ValuesSourceConfig.resolve()
, but in general this is how it should work: for each aggregation family, a mapping of predicates to ValuesSourceType
s and a mapping of ValuesSourceType
s to AggregatorSuppliers
(where the actual aggregator supplier implements an aggregation family specific sub-interface).
This is the heart of the plan, so please raise any concerns with this, no matter how trivial.
@@ -100,6 +106,41 @@ public void testDoubles() throws Exception { | |||
} | |||
} | |||
|
|||
public void testDates() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should support this, but we do currently, so I wanted a test for the legacy case.
…tor-factories Conflicts: server/src/main/java/org/elasticsearch/search/aggregations/support/ValuesSourceConfig.java x-pack/plugin/analytics/src/main/java/org/elasticsearch/xpack/analytics/stringstats/StringStatsAggregationBuilder.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this LGTM. All the comments I had while looking over it are already noted in the code or with TODOs, and it's just an iterative building block too. 👍
ValuesSourceConfig<VS> config = new ValuesSourceConfig<>(ValuesSourceType.ANY); | ||
config.format(resolveFormat(null, valueType, timeZone)); | ||
return config; | ||
throw new IllegalStateException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
* absence of any one field. The type of the script is given by the valueType field, and defaults to bytes if not specified. | ||
*/ | ||
// TODO: Not pluggable, should always be valueType if specified, BYTES if not. | ||
// TODO: Probably should validate that the resulting type is valid for this agg. That needs to be plugable. | ||
ValuesSourceType valuesSourceType = valueType != null ? valueType.getValuesSourceType() : ValuesSourceType.ANY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This is slightly non-obvious. Basically, we moved the check for not having a data source forward in the parsing, so then the aggregation in this test was throwing the wrong exception. So we needed to specify a field now, so it can get far enough to fail with the exception it expects.
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041) this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (elastic#48638) * ValuesSourceRegistry Prototype (elastic#48758) * Remove generics from ValuesSource related classes (elastic#49606) * fix percentile aggregation tests (elastic#50712) * Basic thread safety for ValuesSourceRegistry (elastic#50340) * Remove target value type from ValuesSourceAggregationBuilder (elastic#49943) * Cleanup default values source type (elastic#50992) * CoreValuesSourceType no longer implements Writable (elastic#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (elastic#51131) * Put values source types on fields (elastic#51503) * Remove VST Any (elastic#51539) * Rewire terms agg to use new VS registry (elastic#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (elastic#51337) * Wire Percentiles aggregator into new VS framework (elastic#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (elastic#51647) * fix checkstyle after merge (elastic#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (elastic#51710) * Convert RareTerms to new VS registry (elastic#52166) * Wire up Value Count (elastic#52225) * Wire up Max & Min aggregations (elastic#52219) * ValuesSource refactoring: Wire up Sum aggregation (elastic#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (elastic#52590) * Soft immutability for VSConfig (elastic#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (elastic#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (elastic#52891) * ValuesSource refactoring: Wire up string_stats aggregation (elastic#52875) * VS refactoring: Wire up median (MAD) aggregation (elastic#52945) * fix valuesourcetype issue with constant_keyword field (elastic#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (elastic#52752) * Wire PercentileRanks aggregator into new VS framework (elastic#51693) * Add a VSConfig resolver for aggregations not using the registry (elastic#53038) * Vs refactor wire up ranges and date ranges (elastic#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (elastic#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates elastic#42949. * VS refactoring: convert Boxplot to new registry (elastic#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (elastic#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue elastic#42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (elastic#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue elastic#42949. * Fix type tests for Missing aggregation (elastic#53501) * ValuesSource Refactor: move histo VSType into XPack module (elastic#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (elastic#53484) * Vs refactor parser cleanup (elastic#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>
* Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>
This PR builds out a prototype implementation of the
ValuesSourceRegistry
to explore the API. As an example, I've done some of the conversion work onHistogramAggregatorFactory
to show how this can work. We may need to update this API for more complex aggregations like Terms family.Related to #42949