Skip to content

Add id to MetadataKey so that we can change the key order#9159

Merged
Jackie-Jiang merged 1 commit intoapache:masterfrom
Jackie-Jiang:data_table_metadata_key
Aug 8, 2022
Merged

Add id to MetadataKey so that we can change the key order#9159
Jackie-Jiang merged 1 commit intoapache:masterfrom
Jackie-Jiang:data_table_metadata_key

Conversation

@Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Aug 4, 2022

Assign an id to each MetadataKey in the DataTable so that we can change the key order (insert new keys) if necessary for readability. There have been multiple issues caused by people adding keys in the middle

@codecov-commenter
Copy link

codecov-commenter commented Aug 4, 2022

Codecov Report

Merging #9159 (2853d2a) into master (0aeb42e) will decrease coverage by 0.01%.
The diff coverage is 89.79%.

@@             Coverage Diff              @@
##             master    #9159      +/-   ##
============================================
- Coverage     69.98%   69.96%   -0.02%     
+ Complexity     4758     4757       -1     
============================================
  Files          1847     1847              
  Lines         98555    98566      +11     
  Branches      14969    14972       +3     
============================================
- Hits          68972    68961      -11     
- Misses        24741    24755      +14     
- Partials       4842     4850       +8     
Flag Coverage Δ
integration1 26.28% <89.79%> (+0.05%) ⬆️
integration2 24.64% <85.71%> (-0.20%) ⬇️
unittests1 67.00% <89.79%> (+<0.01%) ⬆️
unittests2 15.30% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../java/org/apache/pinot/common/utils/DataTable.java 91.22% <88.88%> (-4.43%) ⬇️
...che/pinot/core/common/datablock/BaseDataBlock.java 83.51% <100.00%> (ø)
...e/pinot/core/common/datatable/DataTableImplV3.java 94.70% <100.00%> (ø)
...ntroller/helix/core/minion/TaskMetricsEmitter.java 76.74% <0.00%> (-9.31%) ⬇️
...a/org/apache/pinot/common/utils/ServiceStatus.java 60.95% <0.00%> (-7.62%) ⬇️
...roller/helix/core/relocation/SegmentRelocator.java 72.97% <0.00%> (-5.41%) ⬇️
...not/broker/broker/helix/ClusterChangeMediator.java 75.26% <0.00%> (-5.38%) ⬇️
.../filter/predicate/InPredicateEvaluatorFactory.java 68.80% <0.00%> (-4.80%) ⬇️
...thandler/SingleConnectionBrokerRequestHandler.java 88.31% <0.00%> (-3.90%) ⬇️
...gregation/function/StUnionAggregationFunction.java 73.52% <0.00%> (-2.95%) ⬇️
... and 32 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@siddharthteotia
Copy link
Contributor

Part of me thinks that people really need to read the code (javadoc just above the enum) before making bad changes of inserting keys in the middle. So, we should try to stick to current mechanism if possible.

@richardstartin
Copy link
Member

Part of me thinks that people really need to read the code (javadoc just above the enum) before making bad changes of inserting keys in the middle. So, we should try to stick to current mechanism if possible.

+1. Enums should be treated as append-only code constructs.

@FelixGV
Copy link

FelixGV commented Aug 4, 2022

Part of me thinks that people really need to read the code (javadoc just above the enum) before making bad changes of inserting keys in the middle. So, we should try to stick to current mechanism if possible.

+1. Enums should be treated as append-only code constructs.

Following conventions is a good convention, but making the code structure enforce or incentivize conventions may be even better.

IMHO, it's not too controversial to add a manually maintained ID for enums that need to be sent across processes or persisted across restarts of the same process (i.e. any enum that will potentially be produced by one version of the software and consumed by another).

Using the enum's built-in ordinal is error-prone, since it doesn’t survive re-ordering, nor insertions and removals from the middle. Using a manually-maintained ID on the other hand is resilient to all these scenarios.

@Jackie-Jiang Jackie-Jiang force-pushed the data_table_metadata_key branch 2 times, most recently from 5c6ef7b to ee84e5d Compare August 4, 2022 18:15
@Jackie-Jiang
Copy link
Contributor Author

@siddharthteotia @richardstartin
To add some context here, we had got multiple PRs merged without following the convention, which broke the backward compatibility. It is hard for all the contributors and reviewers to remember the convention, especially when the change is not next to the comment describing how to keep backward compatible. As @FelixGV pointed out, using enum's ordinal is error-prone, and we need to use a way to enforce people to follow the convention, and make it easy for reviewers to catch when the convention is not followed.

Example PRs that breaks the backward compatibility:
#9092 Merged then reverted the next day
#8884 Merged but not caught the backward compatibility issue until this PR (already one month!). I actually pointed it out during the review, but the fix goes to the wrong class and I didn't get a chance to review it again. This is a proof that we don't have any guard for this convention, except for the comment that is 30 lines away from the actual change.

@richardstartin
Copy link
Member

@siddharthteotia @richardstartin To add some context here, we had got multiple PRs merged without following the convention, which broke the backward compatibility. It is hard for all the contributors and reviewers to remember the convention, especially when the change is not next to the comment describing how to keep backward compatible. As @FelixGV pointed out, using enum's ordinal is error-prone, and we need to use a way to enforce people to follow the convention, and make it easy for reviewers to catch when the convention is not followed.

Example PRs that breaks the backward compatibility: #9092 Merged then reverted the next day #8884 Merged but not caught the backward compatibility issue until this PR (already one month!). I actually pointed it out during the review, but the fix goes to the wrong class and I didn't get a chance to review it again. This is a proof that we don't have any guard for this convention, except for the comment that is 30 lines away from the actual change.

Clearly a very simple regression test can prevent changes like this from getting merged (and from someone accidentally adding a duplicate id, against which there are currently no guards). Code review shouldn’t focus on things which can be automated simply because hardly anyone has (or should have) the attention span to verify trivialities like these.

@Jackie-Jiang
Copy link
Contributor Author

Clearly a very simple regression test can prevent changes like this from getting merged (and from someone accidentally adding a duplicate id, against which there are currently no guards). Code review shouldn’t focus on things which can be automated simply because hardly anyone has (or should have) the attention span to verify trivialities like these.

We may add a regression test to verify all the existing keys still have the same ordinal, but we also need to take extra effort maintaining the test whenever a new key is added. E.g. #8884 can still happen if the previous PR didn't add the new keys to the test

We do check duplicate id in the static block.

Assigning an id has extra benefits though:

  • People can change the order of the keys for readability (and that's the main reason why people tend to add new keys in the middle so that similar metadata are grouped together, e.g. numConsumingSegmentsQueried, numConsumingSegmentsProcessed, numConsumingSegmentsMatched)
  • We may remove the old keys that no longer apply

@siddharthteotia
Copy link
Contributor

Let me go through the change once today/tomorrow

@Jackie-Jiang Jackie-Jiang force-pushed the data_table_metadata_key branch from ee84e5d to 2853d2a Compare August 5, 2022 17:42
Copy link
Contributor

@siddharthteotia siddharthteotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this to make things more resilient against some of the problems caused by few recent PRs.

However, if as a reviewer we come across a change (e.g reordering enums entries (without a stronger reason)), then I suggest we still push back / advocate for adding entries at the end and other hygiene / practices.

@Jackie-Jiang Jackie-Jiang merged commit cfe95f4 into apache:master Aug 8, 2022
@Jackie-Jiang Jackie-Jiang deleted the data_table_metadata_key branch August 8, 2022 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants