Skip to content

[BUG] NPE in ReplicaShardBatchAllocator during node drops #13993

@SwethaGuptha

Description

@SwethaGuptha

Describe the bug

[2024-05-28T14:34:01,627][ERROR][o.o.c.c.Coordinator      ] [66a99e3966c42f12f229f8b22da6d074] unexpected failure during [node-left]
java.lang.NullPointerException: Cannot invoke "org.opensearch.cluster.routing.ShardRouting.currentNodeId()" because "primaryShard" is null
        at org.opensearch.gateway.ReplicaShardAllocator.cancelExistingRecoveryForBetterMatch(ReplicaShardAllocator.java:108)
        at org.opensearch.gateway.ReplicaShardBatchAllocator.processExistingRecoveries(ReplicaShardBatchAllocator.java:73)
        at org.opensearch.gateway.ShardsBatchGatewayAllocator.afterPrimariesBeforeReplicas(ShardsBatchGatewayAllocator.java:198)
        at org.opensearch.cluster.routing.allocation.AllocationService.allocateAllUnassignedShards(AllocationService.java:639)
        at org.opensearch.cluster.routing.allocation.AllocationService.allocateExistingUnassignedShards(AllocationService.java:609)
        at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:588)
        at org.opensearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:529)
        at org.opensearch.cluster.routing.allocation.AllocationService.disassociateDeadNodes(AllocationService.java:339)
        at org.opensearch.cluster.coordination.NodeRemovalClusterStateTaskExecutor.getTaskClusterTasksResult(NodeRemovalClusterStateTaskExecutor.java:123)
        at org.opensearch.cluster.coordination.NodeRemovalClusterStateTaskExecutor.execute(NodeRemovalClusterStateTaskExecutor.java:113)
        at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:882)
        at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:434)
        at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:301)
        at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:212)
        at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:209)
        at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:247)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)

Related component

Cluster Manager

To Reproduce

Perform all node drops from the cluster.

Expected behavior

Handle the NPE.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🏗 In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions