Skip to content

[Remote State] NullPointerException when remote publication enabled with remote state disabled #15182

Closed
@shiv0408

Description

Describe the bug

If remote state is disabled when node has cluster.remote_store.state.enabled: false in opensearch.yml but remote publication is also enabled with following setting opensearch.experimental.feature.remote_store.publication.enabled=true. The cluster manager tries to publish the state but encounter null pointer exception as remote state is not enabled.

Related component

Cluster Manager

To Reproduce

  1. Add following settings to the opensearch.yml file
node.attr.remote_store.segment.repository: my-fs-repository
node.attr.remote_store.translog.repository: my-fs-repository
node.attr.remote_store.routing_table.repository: my-fs-repository
node.attr.remote_store.repository.my-fs-repository.type: fs
node.attr.remote_store.repository.my-fs-repository.settings.location: ~/os_data/repos/repo-1
cluster.remote_store.state.enabled: true
node.attr.remote_store.state.repository: my-fs-repository
  1. Run the opensearch process with following settings
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.remote_store.enabled=true -Dopensearch.experimental.feature.replication_type.enabled=true -Dopensearch.experimental.feature.remote_store.routing.enabled=true -Dopensearch.experimental.feature.remote_store.publication.enabled=true -Daws.region=us-east-1" ./build/distribution/local/opensearch-3.0.0-SNAPSHOT/bin/opensearch -E cluster.name=hishiv-cluster -E path.data=~/os_data/master1 -E path.repo=~/os_data/repos -E node.name=master1 -E node.master=true -E node.data=false -E node.ingest=false  -E cluster.initial_master_nodes=master1
  1. Publication failing with following error
[2024-08-09T15:44:24,322][WARN ][o.o.c.c.PublicationTransportHandler] [master1] error sending remote cluster state to {master1}{ldGqZarCS3K9MBuR7idMlQ}{U9B1LEitTMKYs2hpP8S2vA}{127.0.0.1}{127.0.0.1:9300}{mr}{shard_indexing_pressure_enabled=true}
java.lang.NullPointerException: Cannot invoke "org.opensearch.gateway.GatewayMetaState$RemotePersistedState.getLastUploadedManifestFile()" because the return value of "org.opensearch.cluster.coordination.PersistedStateRegistry.getPersistedState(org.opensearch.cluster.coordination.PersistedStateRegistry$PersistedStateType)" is null
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendRemoteClusterState(PublicationTransportHandler.java:527) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendPublishRequest(PublicationTransportHandler.java:474) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator$CoordinatorPublication.sendPublishRequest(Coordinator.java:1843) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Publication$PublicationTarget.sendPublishRequest(Publication.java:287) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) [?:?]
	at org.opensearch.cluster.coordination.Publication.start(Publication.java:94) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1356) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.publish(MasterService.java:385) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:367) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:229) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:210) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:252) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2024-08-09T15:44:24,326][WARN ][o.o.c.s.MasterService    ] [master1] failing [Tasks batched with key: org.opensearch.cluster.coordination.JoinHelper and count: 3]: failed to commit cluster state version [1]
org.opensearch.cluster.coordination.FailedToCommitClusterStateException: publishing failed
	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1360) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.publish(MasterService.java:385) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:367) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:229) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:210) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:252) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.ClassCastException: class java.lang.NullPointerException cannot be cast to class org.opensearch.transport.TransportException (java.lang.NullPointerException is in module java.base of loader 'bootstrap'; org.opensearch.transport.TransportException is in unnamed module of loader 'app')
	at org.opensearch.cluster.coordination.Publication$PublicationTarget$PublishResponseHandler.onFailure(Publication.java:410) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator$5.onFailure(Coordinator.java:1403) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext$1.onFailure(PublicationTransportHandler.java:465) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendRemoteClusterState(PublicationTransportHandler.java:571) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendPublishRequest(PublicationTransportHandler.java:474) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator$CoordinatorPublication.sendPublishRequest(Coordinator.java:1843) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Publication$PublicationTarget.sendPublishRequest(Publication.java:287) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) ~[?:?]
	at org.opensearch.cluster.coordination.Publication.start(Publication.java:94) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1356) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	... 11 more

Expected behavior

If we are in such inconsistent state where the remote publication is enabled without remote state being enabled. We should fall back to publication over transport call.

Additional Details

No response

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions