Skip to content

Enhancing Table Size API to return table size per replica#10812

Merged
mayankshriv merged 4 commits intoapache:masterfrom
eaugene:master
Jun 2, 2023
Merged

Enhancing Table Size API to return table size per replica#10812
mayankshriv merged 4 commits intoapache:masterfrom
eaugene:master

Conversation

@eaugene
Copy link
Contributor

@eaugene eaugene commented May 27, 2023

PR contains following changes :

  1. Enhancing Table Size API to return table size per replica in bytes
  2. Code Clean up & removing unused param "detailed" in REST API
  3. Adding & Improving the UT's

Related Discussion : https://apache-pinot.slack.com/archives/C011C9JHN7R/p1684396114803769

cc : @mayankshriv

eaugene added 2 commits May 27, 2023 20:06
2. Code Clean up & removing ununsed param "detailed" in REST API
@codecov-commenter
Copy link

codecov-commenter commented May 27, 2023

Codecov Report

Merging #10812 (843d109) into master (2e6f1e6) will decrease coverage by 46.26%.
The diff coverage is 91.89%.

@@              Coverage Diff              @@
##             master   #10812       +/-   ##
=============================================
- Coverage     70.26%   24.01%   -46.26%     
+ Complexity     6517       49     -6468     
=============================================
  Files          2164     2154       -10     
  Lines        116312   116255       -57     
  Branches      17590    17606       +16     
=============================================
- Hits          81731    27918    -53813     
- Misses        28869    85392    +56523     
+ Partials       5712     2945     -2767     
Flag Coverage Δ
integration1 24.01% <91.89%> (-0.18%) ⬇️
integration2 ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ache/pinot/controller/api/resources/TableSize.java 0.00% <ø> (-66.67%) ⬇️
.../apache/pinot/controller/util/TableSizeReader.java 79.31% <91.89%> (-19.87%) ⬇️

... and 1633 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@mayankshriv mayankshriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required in the PR, but I feel, enhancing the api to also report the compressed size in deepstore would be quite useful.

}
}

// Set the top level sizes to DEFAULT_SIZE_WHEN_MISSING_OR_ERROR when all segments are error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please help me understand the reason for this decision?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explicitly say in response that segments are in error ( and its not 0 sized ). However, the much more correct way for the caller to identify is to look into "missingSegment" property.
With the existing implementation, this would return as "-2" which kind of diverges from the default error value we use , so I move this section at the end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please help me understand why the condition is "when all segments are error". What is the major difference between "when all segments are error" and "when some segments are in error state"?

Say that a table has 1k segments, the current logic set the top level size to -1 when all 1k segments are in error state, but will not set the level size to -1 when only 1 segment is in good state. What is the major difference here?

Copy link
Contributor Author

@eaugene eaugene Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume the case for hybrid Table with all the segments missing .
The current implementation would have value as "-2" ( This is due to both realtime & offline block adding "-1" ). This is what I have tried to explain in my previous comment

}

// Set the top level sizes to DEFAULT_SIZE_WHEN_MISSING_OR_ERROR when all segments are error
if ((hasRealtimeTableConfig && hasOfflineTableConfig && isMissingAllRealtimeSegments && isMissingAllOfflineSegments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we can remove "(hasRealtimeTableConfig && hasOfflineTableConfig && isMissingAllRealtimeSegments && isMissingAllOfflineSegments)" if we want to keep this logic
  2. could you please help me understand why we want to set the the top level sizes to DEFAULT_SIZE_WHEN_MISSING_OR_ERROR when either the offline or realtime table has problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We would require this condition , else it would fail for hybrid table case when All realtime segments are good, while all offline segments are missing & vice versa case as well .
  2. Explained the reason here : https://github.com/apache/pinot/pull/10812/files#r1213281580 . It would be set "-1" when all existing segments of the table are missing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. see the following table, with or without the first condition, the result does not change. Please check.
Screenshot 2023-06-02 at 10 08 38 AM
  1. see my reply above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thanks for the Truth Table way. It helped to catch my error in a boolean algebraic fashion. Fixed it
  2. replied above

@eaugene
Copy link
Contributor Author

eaugene commented Jun 1, 2023

Not required in the PR, but I feel, enhancing the api to also report the compressed size in deepstore would be quite useful.

Sure @mayankshriv . Yes , this would be useful. Created a issue for this #10828

Copy link
Contributor

@zhtaoxiang zhtaoxiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me now. thanks for enhancing it.

@mayankshriv mayankshriv merged commit 87379fc into apache:master Jun 2, 2023
@shenyu0127
Copy link
Contributor

This PR breaks the PinotTaskManagerStatelessTest.

@shenyu0127 shenyu0127 mentioned this pull request Jun 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants