Skip to content

Better cleanup during datanode preflight restart#25061

Open
todvora wants to merge 10 commits intomasterfrom
fix/better-datanode-preflight-cleanup
Open

Better cleanup during datanode preflight restart#25061
todvora wants to merge 10 commits intomasterfrom
fix/better-datanode-preflight-cleanup

Conversation

@todvora
Copy link
Contributor

@todvora todvora commented Feb 19, 2026

This PR is properly handling datanode cleanup during reset of initial preflight setup and migration to databode. It removes signed certificate from the datanode keystore and resets the state machine to waiting for configuration.This should prevent unexpected automatic startups and fix integration tests that are failing now randomly.

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (non-breaking change)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have requested a documentation update.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.

@todvora todvora marked this pull request as draft February 19, 2026 13:50
@todvora todvora marked this pull request as ready for review February 20, 2026 10:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the cleanup process during datanode preflight restart by properly removing signed certificates from the datanode keystore and resetting the state machine to waiting for configuration. This prevents unexpected automatic startups and addresses random integration test failures.

Changes:

  • Added REMOVE_NODE_CONFIGURATION lifecycle trigger to properly clean up datanode state
  • Introduced initWithSelfSignedCertificate() method to reset keystore to self-signed certificate
  • Enhanced integration tests with proper datanode disconnection waiting to ensure clean state between tests

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
DataNodeLifecycleTrigger.java Added new REMOVE_NODE_CONFIGURATION enum value for cleanup trigger
DataNodeCommandService.java Added removeNodeConfiguration() interface method
DataNodeCommandServiceImpl.java Implemented removeNodeConfiguration() to post lifecycle event
PreflightResource.java Added call to removeNodeConfiguration() after stopping node during preflight reset
OpensearchEvent.java Added PROCESS_CONFIGURATION_REMOVED event for state machine
OpensearchStateMachine.java Added state transition from TERMINATED to WAITING_FOR_CONFIGURATION
OpensearchProcess.java Added removeConfiguration() interface method
OpensearchProcessImpl.java Implemented removeConfiguration() to clear configuration
OpensearchProcessService.java Added event handler for REMOVE_NODE_CONFIGURATION trigger
DatanodeKeystore.java Moved and implemented initWithSelfSignedCertificate() method to reset keystore
DatanodeKeystoreInitService.java Refactored to use new keystore initialization method
DatanodeKeystoreTest.java Added comprehensive tests for self-signed certificate initialization
DatanodeProvisioningIT.java Added waitForDatanodesDisconnected() to ensure clean state between tests
pr-25061.toml Added changelog entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments