Make StickyRebalanceStrategy topology aware #2944

frankmu · 2024-10-09T21:33:49Z

Issues

My PR addresses the following Helix issues and references them in the PR description:

resolves #2822

Description

Here are some details about my PR, including screenshots of any UI changes:

Previous StickyRebalanceStrategy leverage greedy approach for assignment. This PR make it topology aware, so for each partition, we will try to assign it's replicas to different fault zones.

Tests

The following tests are written for this issue:

TestStickyRebalanceStrategy.testNoSameZoneAssignment()

mvn test -o -Dtest=TestStickyRebalanceStrategy -pl=helix-core
...
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.78 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 11, Failures: 0, Errors: 0, Skipped: 0
...
mvn test -o -Dtest=TestStickyRebalanceWithGlobalPerInstancePartitionLimit -pl=helix-core
...
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.209 s - in org.apache.helix.integration.rebalancer.TestStickyRebalanceWithGlobalPerInstancePartitionLimit
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

The following is the result of the "mvn test" command on the appropriate module:

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters (not including Jira issue reference)
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

Code Quality

My diff has been formatted using helix-style.xml
(helix-style-intellij.xml if IntelliJ IDE is used)

xyuanlu · 2024-10-10T00:28:18Z

helix-core/src/main/java/org/apache/helix/controller/common/CapacityNode.java

+
+  /**
+   * Get the fault zone of this node
+   * @return The ID of this node


return the fault zone?

xyuanlu · 2024-10-10T00:29:59Z

helix-core/src/main/java/org/apache/helix/controller/common/CapacityNode.java

+   * For example, when
+   * the domain is "zone=2, instance=testInstance" and the fault zone type is "zone", this function
+   * returns "2".
+   * If cannot find the fault zone type, this function leaves the fault zone id as the instance name.


Should we use logical ID as default fault zone if we can't find the fault zone type?

This function is copied from the waged rebalancer: https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableNode.java#L363-L383

Any pointers on why logical ID is a better candidate here?

In terms of swap, using logic id will make the swap in/out instances in the same fault zone. But I also agree that this should align with other rebalancers' behavior.
What do you think about if we add a todo and consider changing both places later?

xyuanlu · 2024-10-10T00:37:56Z

...e/src/main/java/org/apache/helix/controller/rebalancer/strategy/StickyRebalanceStrategy.java

      }
      for (int j = 0; j < remainingReplica; j++) {
        while (index - startIndex < assignableNodeList.size()) {
          CapacityNode node = assignableNodeList.get(index++ % assignableNodeList.size());
-          if (node.canAdd(_resourceName, _partitions.get(i))) {
-            stateMap.computeIfAbsent(_partitions.get(i), m -> new HashSet<>()).add(node.getId());
+          // Valid assignment when following conditions match:


May I suggest move the logic to a separate function for extendability?

zpinto · 2024-10-15T16:59:56Z

helix-core/src/main/java/org/apache/helix/controller/common/CapacityNode.java

+   * TODO: change the return value to logical id when no fault zone type found. Also do the same for
+   *  waged rebalancer in helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/model/AssignableNode.java
+   */
+  private String computeFaultZone(ClusterConfig clusterConfig, InstanceConfig instanceConfig) {


Can we try to create a util class with this logic instead of copying it in two places and having to maintain both methods?

It may be worth while to create a Node base class(different than the one under topology package) that extracts common logic from AssignableNode and CapacityNode.

This is something I prefer to do in later iterations: consolidate CapacityNode + AssignableNode and apply the changes @xyuanlu mentioned above.

zpinto · 2024-10-15T17:50:12Z

.../src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java

@@ -573,11 +575,16 @@ public WagedInstanceCapacity getWagedInstanceCapacity() {
    return _wagedInstanceCapacity;
  }

-  private void buildSimpleCapacityMap(int globalMaxPartitionAllowedPerInstance) {
+  private void buildSimpleCapacityMap() {


This is just a general comment that we don't need to address right away, but we may want to address before merging this to master.

Should we building the capacity nodes in the resource controller data provider? It seems like an implementation detail for one specific rebalance strategy. If there are no resources in the cluster which use StickyRebalanceStrategy then this is not needed. Why don't we move the creation of simpleCapacitySet to the rebalance strategy?

If this is here because StickyRebalanceStrategy relies on global node capacity, maybe we should be implementing StatefulRebalancer interface instead of using single resources rebalance. That has method computeNewIdealStates for all resources and is intended to be implemented for global rebalancer.

If this is here because StickyRebalanceStrategy relies on global node capacity

Yeah, this is exact the reason - we need to populate the node usage globally before the rebalance stage. StatefulRebalancer seems for rebalancer, but here we need the rebalance strategy. To me the data provider is the glue for rebalancer and rebalance strategy.

zpinto · 2024-10-15T18:05:31Z

.../src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java

-    for (String instance : getEnabledLiveInstances()) {
-      CapacityNode capacityNode = new CapacityNode(instance);
-      capacityNode.setCapacity(globalMaxPartitionAllowedPerInstance);
+    for (String instanceName : getEnabledLiveInstances()) {


Why don't disabled and down instances get included in the capacitySet? This will cause all the replicas on the down or disabled nodes to be reassigned somewhere else in the cluster if delayed rebalance is being used. We should have all assignable instances in the map and leave it to the rebalancer to pass down liveNodes to computePartitionAssignment.

This will cause all the replicas on the down or disabled nodes to be reassigned somewhere else in the cluster if delayed rebalance is being used

If the node is disabled or down, do we expect to re-assign the replica to other nodes?

This behavior is determined by the rebalancer and not rebalance strategy. If you use DelayedAutoRebalancer it should not try to reassign the replicas to other nodes until after the delay window. If you use AutoRebalancer, it should.

If we only have EnabledLive in capacitySet, we will make the behavior as I described for AutoRebalancer even when DelayedAutoRebalancer is used. This will be inconsistent behavior with assumption of how each rebalancer should work.

Got you, thanks for the detailed context, will make the changes

zpinto

Overall LGTM for now. I think that in a later PR we need to revisit the following comments and address them more cleanly:
#2944 (comment)
#2944 (comment)

Also, have a concern on the test case. Please take a look.

zpinto · 2024-10-22T02:38:28Z

...-core/src/test/java/org/apache/helix/integration/rebalancer/TestStickyRebalanceStrategy.java

+  @Test
+  public void testNoSameZoneAssignment() throws Exception {
+    setTopologyAwareAndGlobalMaxPartitionAllowedPerInstanceInCluster(CLUSTER_NAME, 1);
+    Map<String, ExternalView> externalViews = createTestDBs();


Does createTestDBs use the verifier to make sure the controller thread is done computing the assignment before starting the assertions? If not, this should be added.

Do you mean by following?

Assert.assertTrue(_clusterVerifier.verifyByPolling());

If yes, then yes and here is the code pointer: https://github.com/apache/helix/blob/helix-gateway-service/helix-core/src/test/java/org/apache/helix/integration/rebalancer/TestStickyRebalanceStrategy.java#L332

xyuanlu · 2024-10-22T16:49:39Z

helix-core/src/main/java/org/apache/helix/controller/common/CapacityNode.java

+  }
+
+  /**
+   * Constructor used for non-topology-aware use case


i think this is for topology aware?
also can we have one constructor call another?

Yes, will make this change

Make StickyRebalanceStrategy topology aware

27080a5

xyuanlu reviewed Oct 10, 2024

View reviewed changes

tengfmu added 2 commits October 10, 2024 09:51

move canAdd() to a separate function for extendability

ab1c572

add todo

5c5f866

zpinto reviewed Oct 15, 2024

View reviewed changes

address comment and add test

c5e0be7

zpinto reviewed Oct 22, 2024

View reviewed changes

xyuanlu reviewed Oct 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make StickyRebalanceStrategy topology aware #2944

Make StickyRebalanceStrategy topology aware #2944

frankmu commented Oct 9, 2024 •

edited

Loading

xyuanlu Oct 10, 2024

xyuanlu Oct 10, 2024

frankmu Oct 10, 2024

xyuanlu Oct 10, 2024

xyuanlu Oct 10, 2024

zpinto Oct 15, 2024

zpinto Oct 15, 2024

frankmu Oct 15, 2024

zpinto Oct 15, 2024

frankmu Oct 15, 2024

zpinto Oct 15, 2024

frankmu Oct 15, 2024

zpinto Oct 15, 2024

frankmu Oct 15, 2024

zpinto left a comment

zpinto Oct 22, 2024

frankmu Oct 22, 2024

xyuanlu Oct 22, 2024

frankmu Oct 22, 2024

Make StickyRebalanceStrategy topology aware #2944

Are you sure you want to change the base?

Make StickyRebalanceStrategy topology aware #2944

Conversation

frankmu commented Oct 9, 2024 • edited Loading

Issues

Description

Tests

Changes that Break Backward Compatibility (Optional)

Documentation (Optional)

Commits

Code Quality

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zpinto left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frankmu commented Oct 9, 2024 •

edited

Loading