Optimize group-by and join for single key scenario#11630
Optimize group-by and join for single key scenario#11630Jackie-Jiang merged 1 commit intoapache:masterfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11630 +/- ##
============================================
+ Coverage 63.08% 63.10% +0.01%
- Complexity 1110 1121 +11
============================================
Files 2330 2334 +4
Lines 125145 125185 +40
Branches 19195 19208 +13
============================================
+ Hits 78952 78994 +42
- Misses 40558 40561 +3
+ Partials 5635 5630 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Please fix the CI. |
f4b16c6 to
360748e
Compare
| List<ColocationKey> colocationKeys = ((FieldSelectionKeySelector) selector).getColumnIndices().stream() | ||
| .map(x -> new ColocationKey(x, numPartitions, selector.hashAlgorithm())).collect(Collectors.toList()); | ||
| List<ColocationKey> colocationKeys = | ||
| distributionKeys.stream().map(x -> new ColocationKey(x, numPartitions, KeySelector.DEFAULT_HASH_ALGORITHM)) |
There was a problem hiding this comment.
shall we make this algorithm configurable?
There was a problem hiding this comment.
This optimizer is triggered by a hint, and we are not using it. I believe the long term plan is to remove it. cc @walterddr
There was a problem hiding this comment.
i think for now we can assume that. the problem is we dont know how users will actually be using. having this configurable can be done in the future when needed.
| private KeySelectorFactory() { | ||
| } | ||
|
|
||
| public static KeySelector<?> getKeySelector(List<Integer> keyIds) { |
There was a problem hiding this comment.
@xiangfu0 we can add configurable algorithms by adding other factory methods IMO
| private List<Integer> _leftKeys; | ||
| @ProtoProperties | ||
| private KeySelector<Object[], Object[]> _rightJoinKeySelector; | ||
| private List<Integer> _rightKeys; |
There was a problem hiding this comment.
superb refactoring. this was a legacy property we added for pluggability that we never used. we should've removed these and use factory methods even before the 1.0 release :-P
Keyobject. Directly use the value as the key in the map.ArrayList, group key map inObject2IntOpenHashMapBitSet