Skip to content

Fix AggregationFactories.Builder equality and hash regarding order #34005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Sep 24, 2018

Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization and deserialization.

Note:

At time of writing the impact of this issue mainly affects tests (-> non-issue), however in future work I am storing aggregation configuration as part of job configurations and therefore require this fix.

It also fixes a potential issue in caching when the same query contains aggregations in different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in the cache.

Previous fix: #33942

switches to a LinkedHashSet instead of a List.

CC @jimczi @cbuescher @colings86

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but +1 for this approach.

@@ -79,12 +79,12 @@ public String getName() {
public abstract AggregationBuilder subAggregation(PipelineAggregationBuilder aggregation);

/** Return the configured set of subaggregations **/
public List<AggregationBuilder> getSubAggregations() {
public Set<AggregationBuilder> getSubAggregations() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Collection ?

@@ -68,8 +68,8 @@ public String getName() {
* Internal: Validates the state of this factory (makes sure the factory is properly
* configured)
*/
protected abstract void validate(AggregatorFactory<?> parent, List<AggregationBuilder> factories,
List<PipelineAggregationBuilder> pipelineAggregatorFactories);
protected abstract void validate(AggregatorFactory<?> parent, Set<AggregationBuilder> aggregationBuilders2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, Collection ?

@@ -0,0 +1,162 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have BaseAggregationTestCase for aggregators so we could maybe add the testing for this case directly there. It would test all implementations rather than just a few.

@hendrikmuhs
Copy link
Author

Thanks @jimczi! I will work on the details once we agreed on the approach.

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a small comment but I'm also +1 for this approach as it conveys the fact that the order is not set if you get the builders from the AggregatorFactories.Builder but are set when we create the AggregatorFactories object in build()

}
return new AggregatorFactories(parent, aggFactories, orderedpipelineAggregators);
}

private List<PipelineAggregationBuilder> resolvePipelineAggregatorOrder(
List<PipelineAggregationBuilder> pipelineAggregatorBuilders, List<AggregationBuilder> aggBuilders,
Set<PipelineAggregationBuilder> pipelineAggregatorBuilders2, Set<AggregationBuilder> aggregationBuilders2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems 2 has slipped into the naming here can we revert to the original names?

@hendrikmuhs hendrikmuhs force-pushed the fix-aggregationbuilder-equality-and-hash-switch-to-set branch from 17ce46d to 9135fe8 Compare September 26, 2018 06:17
@hendrikmuhs
Copy link
Author

hendrikmuhs commented Sep 26, 2018

Addressed the review comments. @jimczi @colings86

The remaining test failure is another ordering problem: Prior to that change ordering was clearly defined by the order of aggregations in the query. I changed the IT to access by key rather than by position.

For the doc/yaml build however: I do not know how to do this and wonder if this PR is good or not. There are potentially more tests that purely work on text matching. Order is not defined in JSON, still users might rely on order. This change makes the order somewhat unpredictable for a user. I think it would be nice to bring back the original behavior or always order in a defined way (lexical order).

Thoughts? Comments?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, thanks for making the changes @hendrikmuhs . I don't think we need AggregatorFactoriesBuilderTests though. Instead we could change BaseAggregationTestCase to explicitly test serialization/equals/hashCode with more than one builder ?

@jimczi
Copy link
Contributor

jimczi commented Sep 26, 2018

For the doc/yaml build however: I do not know how to do this and wonder if this PR is good or not. There are potentially more tests that purely work on text matching. Order is not defined in JSON, still users might rely on order. This change makes the order somewhat unpredictable for a user. I think it would be nice to bring back the original behavior or always order in a defined way (lexical order).

I don't think we should do anything here. The order will change if you have a proxy or a client that processes the json differently so we should not force anything. If we have tests that rely on ordering we should change them but I am 99% sure ;) that the doc/yaml tests don't care about the order.

@hendrikmuhs hendrikmuhs force-pushed the fix-aggregationbuilder-equality-and-hash-switch-to-set branch from de6261b to d658931 Compare September 26, 2018 14:33
@hendrikmuhs
Copy link
Author

I take my 1% chance. ;-) There were 2 ordering problems in the doc tests, 1 was fixable the other one not.

But, after changing to a LinkedHashSet from the ordinary HashSet I was able to reintroduce the old behavior of respecting the insertion order. This makes it more user friendly and fixes the doc test issues.

@hendrikmuhs
Copy link
Author

I don't think we need AggregatorFactoriesBuilderTests though. Instead we could change BaseAggregationTestCase to explicitly test serialization/equals/hashCode with more than one builder ?

This does not tackle the problem. The BaseAggregationTestCase is a base class for testing aggregations and it already tests equals and hashCode, all aggregations are fine w.r.t. to serialization. AggregationBuilder is fine as well.

The problem found is only in AggregationFactories.Builder which missed test coverage (AggregationFactories has tests).

I noted that the title is misleading, I will fix that.

@hendrikmuhs hendrikmuhs changed the title Fix aggregationbuilder equality and hash regarding order alternative Fix AggregationFactories.Builder equality and hash regarding order Sep 26, 2018
@jimczi
Copy link
Contributor

jimczi commented Sep 26, 2018

I take my 1% chance. ;-) There were 2 ordering problems in the doc tests, 1 was fixable the other one not.

I think it's fine to use a LinkedHashSet but the ordering issues are due to other APIs (profile) that use an array to reference the sub-aggregations. AbstractAggregationBuilder serializes the sub-aggregations as maps because that's what they really are:

{
	"size": 0,
	"aggs": {
		"1": {
			"terms": {
				"field": "country.keyword"
			},
			"aggs": {
				"2": {
					"avg": {
						"field": "value"
					}
				},
				"4": {
					"sum": {
						"field": "value"
					}
				}
			}
		}
	}
}

The fact that we use an array to represent them internally should not change the reasoning. The order of sub-aggregations in the request should not matter.

This does not tackle the problem. The BaseAggregationTestCase is a base class for testing aggregations and it already tests equals and hashCode, all aggregations are fine w.r.t. to serialization. AggregationBuilder is fine as well.

This pr fixes a real issue, today different orderings of the same sub-aggregations are cached individually. So this query for instance:

{
	"size": 0,
	"aggs": {
		"1": {
			"terms": {
				"field": "country.keyword"
			},
			"aggs": {
				"4": {
					"sum": {
						"field": "value"
					}
				},
				"2": {
					"avg": {
						"field": "value"
					}
				}
			}
		}
	}
}

... which yields the same response than the first example is considered as a different query by our cache. IMO this is a bug in the AggregationBuilder class which is why I think it should test this case in BaseAggregationTestCase. equals and hashCode are tested only when there is no or a single sub-agg which is why we never caught this issue before.

@hendrikmuhs
Copy link
Author

hendrikmuhs commented Sep 26, 2018

I now better understand what you mean and I can try to improve test coverage of BaseAggregationTestCase. I do not think it can replace AggregatorFactoriesBuilderTests completely, because it is a an implementation of AbstractSerializingTestCase which ensures that ToXContent and Writable are correctly implemented. I think we should keep that part.

The non-inherited testUnorderedEqualsSubSet - is added sugar, I happily (re)move that.

@hendrikmuhs hendrikmuhs force-pushed the fix-aggregationbuilder-equality-and-hash-switch-to-set branch from 9cebc2e to 6ab6555 Compare September 27, 2018 12:43
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @hendrikmuhs !

@hendrikmuhs hendrikmuhs merged commit e2f310b into elastic:master Sep 28, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 28, 2018
* master:
  Use more precise does S3 bucket exist method (elastic#34123)
  LLREST: Introduce a strict mode (elastic#33708)
  [CCR] Adjust list retryable errors (elastic#33985)
  Fix AggregationFactories.Builder equality and hash regarding order (elastic#34005)
  MINOR: Remove some deadcode in NodeEnv and Related (elastic#34133)
  Rest-Api-Spec: Correct spelling in filter_path description (elastic#33154)
  Core: Don't rely on java time for epoch seconds formatting (elastic#34086)
  Retry errors when fetching follower global checkpoint. (elastic#34019)
  Watcher: Reenable watcher stats REST tests (elastic#34107)
  Remove special-casing of Synonym filters in AnalysisRegistry (elastic#34034)
  Rename CCR APIs (elastic#34027)
  Fixed CCR stats api serialization issues and (elastic#33983)
  Support 'string'-style queries on metadata fields when reasonable. (elastic#34089)
  Logging: Drop Settings from security logger get calls (elastic#33940)
  SQL: Internal refactoring of operators as functions (elastic#34097)
hendrikmuhs pushed a commit that referenced this pull request Sep 28, 2018
…34005)

Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization
and deserialization. This ensures storing configs with aggregation works properly.

This also addresses a potential issue in caching when the same query contains aggregations but in
different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in
the cache.
hendrikmuhs pushed a commit that referenced this pull request Oct 16, 2018
re-enables tests after upstream fixes, see #34005
@hendrikmuhs hendrikmuhs deleted the fix-aggregationbuilder-equality-and-hash-switch-to-set branch October 18, 2018 08:29
kcm pushed a commit that referenced this pull request Oct 30, 2018
…34005)

Fixes the equals and hash function to ignore the order of aggregations to ensure equality after serialization
and deserialization. This ensures storing configs with aggregation works properly.

This also addresses a potential issue in caching when the same query contains aggregations but in 
different order. 1st it will not hit in the cache, 2nd cache objects which shall be equal might end up twice in 
the cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants