Skip to content

add metric for largest segment size#8445

Merged
Jackie-Jiang merged 4 commits intoapache:masterfrom
jadami10:jadami/largest-segment-metric
Apr 1, 2022
Merged

add metric for largest segment size#8445
Jackie-Jiang merged 4 commits intoapache:masterfrom
jadami10:jadami/largest-segment-metric

Conversation

@jadami10
Copy link
Contributor

Description

This adds a metric for the largest segment size on disk by table

  • it only publishes the metric if there is at least 1 segment
  • the gauge is not reset if all segments are deleted for example. this should be very rare
  • this includes the entire segment directory including indices, not just the columns.psf file

I tested by running the batch quickstart with a breakpoint and asserting the value at debug time equalled what I saw on disk

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

This adds a metric for the largest segment size on disk in a server. You can use this to track the growth of you largest segments.

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2022

Codecov Report

Merging #8445 (0e8b052) into master (e2053f6) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master    #8445      +/-   ##
============================================
- Coverage     70.73%   70.72%   -0.02%     
- Complexity     4282     4284       +2     
============================================
  Files          1662     1663       +1     
  Lines         87227    87305      +78     
  Branches      13205    13216      +11     
============================================
+ Hits          61703    61745      +42     
- Misses        21238    21279      +41     
+ Partials       4286     4281       -5     
Flag Coverage Δ
integration1 27.15% <100.00%> (-0.09%) ⬇️
integration2 26.07% <95.65%> (+0.05%) ⬆️
unittests1 67.07% <0.00%> (+0.02%) ⬆️
unittests2 14.15% <91.30%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...g/apache/pinot/common/metrics/ControllerGauge.java 97.61% <100.00%> (+0.05%) ⬆️
.../apache/pinot/controller/util/TableSizeReader.java 99.18% <100.00%> (+0.12%) ⬆️
...pinot/common/utils/fetcher/HttpSegmentFetcher.java 61.53% <0.00%> (-10.26%) ⬇️
...a/org/apache/pinot/common/utils/ServiceStatus.java 60.00% <0.00%> (-7.15%) ⬇️
...r/helix/SegmentOnlineOfflineStateModelFactory.java 58.49% <0.00%> (-6.61%) ⬇️
...nction/DistinctCountBitmapAggregationFunction.java 47.66% <0.00%> (-5.19%) ⬇️
...tion/groupby/DictionaryBasedGroupKeyGenerator.java 88.25% <0.00%> (-2.76%) ⬇️
...form/function/BinaryOperatorTransformFunction.java 41.77% <0.00%> (-2.67%) ⬇️
...not/broker/broker/helix/ClusterChangeMediator.java 77.65% <0.00%> (-2.13%) ⬇️
.../pinot/server/starter/helix/BaseServerStarter.java 57.46% <0.00%> (-1.86%) ⬇️
... and 35 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2053f6...0e8b052. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the prometheus config and some tests? You may refer to #8358 which has very similar changes

@Deprecated // Instead use TABLE_TOTAL_SIZE_ON_SERVER
OFFLINE_TABLE_ESTIMATED_SIZE("OfflineTableEstimatedSize", false),

LARGEST_SEGMENT_SIZE_ON_DISK("LargestSegmentSizeOnDisk", false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LARGEST_SEGMENT_SIZE_ON_DISK("LargestSegmentSizeOnDisk", false),
LARGEST_SEGMENT_SIZE_ON_SERVER("LargestSegmentSizeOnServer", false),

tableSizeDetails._realtimeSegments._estimatedSizeInBytes / _helixResourceManager.getNumReplicas(
realtimeTableConfig));

tableSizeDetails._realtimeSegments._segments.values().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) Suggest using the regular non-stream APIs for the following reasons:

  1. This is indeed more compact, but slightly hard to read
  2. It has worse performance comparing to regular APIs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fair, it was just as tedious to write as it is to read. I factored out the "default" size we see when segments are missing into a variable and rewrote it with regular APIs

@jadami10
Copy link
Contributor Author

jadami10 commented Apr 1, 2022

Can you please add the prometheus config and some tests? You may refer to #8358 which has very similar changes

Added both. It tests for realtime, offline, and all segments missing. thank you for the pointer PR

}
return null;
});
String path = (String) invocationOnMock.getArguments()[0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not changes here. the reformat changed the whitespace

@Jackie-Jiang Jackie-Jiang merged commit 370d86c into apache:master Apr 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants