Skip to content

Table rebalance status#9662

Closed
antwigambrah wants to merge 15 commits intoapache:masterfrom
antwigambrah:table_rebalance_status
Closed

Table rebalance status#9662
antwigambrah wants to merge 15 commits intoapache:masterfrom
antwigambrah:table_rebalance_status

Conversation

@antwigambrah
Copy link

Description:
This PR adds an api for checking table rebalance status

This PR will solve the following issue --#9558

cc @Jackie-Jiang

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2022

Codecov Report

Merging #9662 (f8eb0e1) into master (7eba70c) will decrease coverage by 45.30%.
The diff coverage is 0.00%.

@@              Coverage Diff              @@
##             master    #9662       +/-   ##
=============================================
- Coverage     70.42%   25.12%   -45.31%     
+ Complexity     5662       44     -5618     
=============================================
  Files          1991     1979       -12     
  Lines        107251   106924      -327     
  Branches      16302    16269       -33     
=============================================
- Hits          75533    26861    -48672     
- Misses        26444    77351    +50907     
+ Partials       5274     2712     -2562     
Flag Coverage Δ
integration1 25.12% <0.00%> (-0.05%) ⬇️
integration2 ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ler/api/resources/PinotSegmentRestletResource.java 32.13% <0.00%> (-7.59%) ⬇️
...ntroller/helix/core/PinotHelixResourceManager.java 39.56% <0.00%> (-31.60%) ⬇️
...ntroller/helix/core/rebalance/TableRebalancer.java 53.82% <0.00%> (-22.14%) ⬇️
...spi/utils/builder/ControllerRequestURLBuilder.java 0.00% <0.00%> (ø)
...ain/java/org/apache/pinot/spi/utils/LoopUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...in/java/org/apache/pinot/spi/utils/BytesUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/BaseRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/NoOpRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/user/RoleType.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1449 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

if (idealState == null) {
throw new IllegalStateException("ideal state is empty");
}
if (externalView != null && isExternalViewConverged(tableNameWithType, externalView.getRecord().getMapFields(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to consider the situation where external view never converges with what ongoing rebalancer expects or external view ends up having segments in ERROR state ?

Looks like we will continue returning IN_PROGRESS in such scenarios and that could be misleading ? Probably also depends on what is the call to action for the user/admin who uses this API...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @siddharthteotia, sorry I updated the above code before reading this comment.Currently, the external view returns false if it is not converged yet and logs an error if the external view ends up having segments in error .I believe it is best to return the status FAILED instead of logging. Also, would it be ok to return NO_OP if the external view is still null when the api is called .cc @Jackie-Jiang


if (isExternalViewConverged(tableNameWithType, externalView.getRecord().getMapFields(),
idealState.getRecord().getMapFields(), true)) {
return RebalanceResult.Status.DONE;
Copy link
Contributor

@siddharthteotia siddharthteotia Oct 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it makes sense / helpful enough to sort of implement a stateless API to get status of rebalance task -- which was triggered separately through some other API and is being done asynchronously on potentially a different controller that getStatus() API call comes on

Relying on the fact that ExternalView is not the same as IdealState to indicate Rebalance is IN_PROGRESS could be misleading.

Segment assignment changes ideal state as soon as segment lands on controller and up until server successfully downloads the segment etc, ExternalView does not get updated. But this API will return rebalance is in IN_PROGRESS even though rebalance may have already finished or could have errored out etc.

So not sure if this is going to be super helpful / accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI, @siddharthteotia Yes it will be quite misleading. Checking for IN_PROGRESS seems tricky here.

The possible way I can think for now through testing I have done on TableRebalancerClusterStatelessTest is to check on the external view from being null to being allocated segments assignments.

For DONE status, I will run the rebalance in dry mode and retrieve the new assignments from the result and use is checking if the extervalViewConverged . Kindly let me know you thoughts on it . cc @Jackie-Jiang

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API added in this PR basically checks if the table's EV matches IS. This itself cannot represent the status of rebalance because rebalance can be done in multiple steps, and in the middle EV can match IS. But this is also a quite useful endpoint for status tracking purpose.
I'd suggest changing the API to return whether the table has EV converged, and if not, list down the segments that do not have the same state in EV and IS.

@antwigambrah
Copy link
Author

@Jackie-Jiang Please review with the new update

@ApiParam(value = "Name of the table", required = true) @PathParam("tableName") String tableName,
@ApiParam(value = "OFFLINE|REALTIME", required = true) @QueryParam("type") String tableTypeStr) {
try {
String tableNameWithType = constructTableNameWithType(tableName, tableTypeStr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(code format) Please reformat the changes with Pinot Style

}

@GET
@Path("/tables/{tableName}/rebalance/status")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the API more specific, and it might be better to put it under the segment rest API: /segments/{tableName}/externalViewMismatch and returns a map from the mismatch segment to its ideal state and external view


public Map<String, MapDifference.ValueDifference<Map<String, String>>> rebalanceTableStatus(String tableNameWithType)
throws TableNotFoundException {
TableConfig tableConfig = getTableConfig(tableNameWithType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to read the table config. We should check both IS and EV exist

}
}

public Map<String, MapDifference.ValueDifference<Map<String, String>>> rebalanceStatus(String tableNameWithType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put this into the TableRebalancer. We can directly have it in PinotHelixResourceManager

try {
tableType = TableType.valueOf(tableTypeStr.toUpperCase());
} catch (Exception e) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Please revert some unnecessary empty line changes in this PR

@antwigambrah
Copy link
Author

Hi @Jackie-Jiang , Please review with the new update

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall direction looks good

Comment on lines +1081 to +1082
@ApiOperation(value = "Table segment mismatch after rebalance", notes = "Get segments without the same state in EV "
+ "and IS ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Let's not mention rebalance here

Suggested change
@ApiOperation(value = "Table segment mismatch after rebalance", notes = "Get segments without the same state in EV "
+ "and IS ")
@ApiOperation(value = "Get segments with mismatched state in EV and IS",
notes = "Get segments without the same state in EV and IS ")

Comment on lines +1086 to +1097
try {
TableType tableType = Constants.validateTableType(tableTypeStr);

if (tableType == null) {
throw new ControllerApplicationException(LOGGER, "Table type must not be null", Status.BAD_REQUEST);
}
String tableNameWithType = TableNameBuilder.forType(TableType.valueOf(tableTypeStr)).tableNameWithType(tableName);

if (!_pinotHelixResourceManager.hasTable(tableNameWithType)) {
throw new TableNotFoundException(String.format("Table=%s not found", tableName));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have util to perform the same checks. You may refer to listSegmentLineage()

    TableType tableType = Constants.validateTableType(tableTypeStr);
    if (tableType == null) {
      throw new ControllerApplicationException(LOGGER, "Table type should either be offline or realtime",
          Response.Status.BAD_REQUEST);
    }
    String tableNameWithType =
        ResourceUtils.getExistingTableNamesWithType(_pinotHelixResourceManager, tableName, tableType, LOGGER).get(0);


_helixResourceManager.deleteOfflineTable(TIERED_TABLE_NAME);
}
@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest not adding the rebalance status check test in this PR. This PR is focusing on the EV mismatch, and we can use it to build the rebalance status check in a separate PR and add the test there.

try {
tableType = TableType.valueOf(tableTypeStr.toUpperCase());
} catch (Exception e) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Revert this change

String rawTableName = TableNameBuilder.extractRawTableName(tableNameWithType);
boolean hasPreConfiguredInstancePartitions = TableConfigUtils.hasPreConfiguredInstancePartitions(tableConfig,
instancePartitionsType);
boolean hasPreConfiguredInstancePartitions =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Revert the changes in this file since it is not related

ControllerTest.sendPostRequest(createTableUrl, testTableConfig.toJsonString());

// Rebalance table
ControllerTest.sendPostRequest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no segment in this table, so both rebalance and segment mismatch will be no-op. Let's try to generate a non-empty IS and EV with some diff, and verify the behavior

Map<String, MapDifference.ValueDifference<Map<String, String>>> view =
_pinotHelixResourceManager.getExternalViewSegementMismatch(tableNameWithType);
ObjectNode data = JsonUtils.newObjectNode();
data.put("segments", view.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we might want the response to be the following format (segmentName -> EV/IS -> different instanceState):

{
  "segment0": {
    "IdealState": {
      "server0": "ONLINE",
      "server1": "ONLINE"
    },
    "externalView": {
      "server1": "ERROR",
      "server2": "ONLINE"
    }
  },
  "segment1": {
    "IdealState": {
      "server1": "ONLINE"
    },
    "externalView": {
      "server1": "ERROR"
    }
  }
}

@antwigambrah
Copy link
Author

Hi @Jackie-Jiang , an update has been provided for this pr. Kindly review

@Jackie-Jiang
Copy link
Contributor

Decided to move forward with #10359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants