Skip to content

Conversation

sarvekshayr
Copy link
Contributor

What changes were proposed in this pull request?

stopBalancer should be idempotent - meaning that if the balancer is already in a STOPPED state, invoking the stop command again should simply return successfully with exit code 0 and perform no action.

Currently, it instead throws the following error:

Caused by: org.apache.hadoop.hdds.scm.container.balancer.IllegalContainerBalancerStateException: Expect ContainerBalancer as running state, but running state is actually STOPPED

This behavior is fixed so that repeated stop calls are handled gracefully.

What is the link to the Apache JIRA

HDDS-13694

How was this patch tested?

Tested in TestContainerBalancer#testStartBalancerStop.
CI: https://github.com/sarvekshayr/ozone/actions/runs/17828304300

@sarvekshayr
Copy link
Contributor Author

@siddhantsangwan Please review this PR.

lock.lock();
try {
validateState(true);
validateEligibility();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we write this logic something like below

public void stopBalancer()
      throws IOException, IllegalContainerBalancerStateException {
    Thread balancingThread = null;
    lock.lock();
    try {
      if (!isBalancerRunning()) {
        LOG.info("ContainerBalancer already stopped;");
        return;
      }
      validateEligibility();
      saveConfiguration(config, false, 0);
      LOG.info("Trying to stop ContainerBalancer service.");
      task.stop();
      balancingThread = currentBalancingThread;
    } finally {
      lock.unlock();
    }
    if (balancingThread != null) {
      blockTillTaskStop(balancingThread);
    }
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan What's your take on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyeshkaratha Please check the JIRA description for the proposed change. The idea is to make API idempotent and also fix a race condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@priyeshkaratha I had created this jira to explain the race - https://issues.apache.org/jira/browse/HDDS-13698.

Copy link
Contributor

@priyeshkaratha priyeshkaratha Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the changes but my question if its already stopped do we really need to check validateEligibility? Instead can we return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the in-memory state of being stopped may not be the correct state as I've described in the jira.

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments on the tests.

@siddhantsangwan
Copy link
Contributor

@sarvekshayr please check if the test failures are related

@sarvekshayr
Copy link
Contributor Author

@sarvekshayr please check if the test failures are related

No, the test failures are not related. Please retrigger the failed ones.

@siddhantsangwan siddhantsangwan merged commit 34abace into apache:master Sep 25, 2025
156 of 162 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants