Closed
Description
Describe the bug
If remote state is disabled when node has cluster.remote_store.state.enabled: false
in opensearch.yml but remote publication is also enabled with following setting opensearch.experimental.feature.remote_store.publication.enabled=true
. The cluster manager tries to publish the state but encounter null pointer exception as remote state is not enabled.
Related component
Cluster Manager
To Reproduce
- Add following settings to the
opensearch.yml
file
node.attr.remote_store.segment.repository: my-fs-repository
node.attr.remote_store.translog.repository: my-fs-repository
node.attr.remote_store.routing_table.repository: my-fs-repository
node.attr.remote_store.repository.my-fs-repository.type: fs
node.attr.remote_store.repository.my-fs-repository.settings.location: ~/os_data/repos/repo-1
cluster.remote_store.state.enabled: true
node.attr.remote_store.state.repository: my-fs-repository
- Run the opensearch process with following settings
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.remote_store.enabled=true -Dopensearch.experimental.feature.replication_type.enabled=true -Dopensearch.experimental.feature.remote_store.routing.enabled=true -Dopensearch.experimental.feature.remote_store.publication.enabled=true -Daws.region=us-east-1" ./build/distribution/local/opensearch-3.0.0-SNAPSHOT/bin/opensearch -E cluster.name=hishiv-cluster -E path.data=~/os_data/master1 -E path.repo=~/os_data/repos -E node.name=master1 -E node.master=true -E node.data=false -E node.ingest=false -E cluster.initial_master_nodes=master1
- Publication failing with following error
[2024-08-09T15:44:24,322][WARN ][o.o.c.c.PublicationTransportHandler] [master1] error sending remote cluster state to {master1}{ldGqZarCS3K9MBuR7idMlQ}{U9B1LEitTMKYs2hpP8S2vA}{127.0.0.1}{127.0.0.1:9300}{mr}{shard_indexing_pressure_enabled=true}
java.lang.NullPointerException: Cannot invoke "org.opensearch.gateway.GatewayMetaState$RemotePersistedState.getLastUploadedManifestFile()" because the return value of "org.opensearch.cluster.coordination.PersistedStateRegistry.getPersistedState(org.opensearch.cluster.coordination.PersistedStateRegistry$PersistedStateType)" is null
at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendRemoteClusterState(PublicationTransportHandler.java:527) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendPublishRequest(PublicationTransportHandler.java:474) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.Coordinator$CoordinatorPublication.sendPublishRequest(Coordinator.java:1843) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.Publication$PublicationTarget.sendPublishRequest(Publication.java:287) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) [?:?]
at org.opensearch.cluster.coordination.Publication.start(Publication.java:94) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1356) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.MasterService.publish(MasterService.java:385) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:367) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:229) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:210) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:252) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2024-08-09T15:44:24,326][WARN ][o.o.c.s.MasterService ] [master1] failing [Tasks batched with key: org.opensearch.cluster.coordination.JoinHelper and count: 3]: failed to commit cluster state version [1]
org.opensearch.cluster.coordination.FailedToCommitClusterStateException: publishing failed
at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1360) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.MasterService.publish(MasterService.java:385) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:367) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:229) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:210) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:252) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.ClassCastException: class java.lang.NullPointerException cannot be cast to class org.opensearch.transport.TransportException (java.lang.NullPointerException is in module java.base of loader 'bootstrap'; org.opensearch.transport.TransportException is in unnamed module of loader 'app')
at org.opensearch.cluster.coordination.Publication$PublicationTarget$PublishResponseHandler.onFailure(Publication.java:410) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.Coordinator$5.onFailure(Coordinator.java:1403) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext$1.onFailure(PublicationTransportHandler.java:465) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendRemoteClusterState(PublicationTransportHandler.java:571) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.PublicationTransportHandler$PublicationContext.sendPublishRequest(PublicationTransportHandler.java:474) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.Coordinator$CoordinatorPublication.sendPublishRequest(Coordinator.java:1843) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.Publication$PublicationTarget.sendPublishRequest(Publication.java:287) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596) ~[?:?]
at org.opensearch.cluster.coordination.Publication.start(Publication.java:94) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
at org.opensearch.cluster.coordination.Coordinator.publish(Coordinator.java:1356) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
... 11 more
Expected behavior
If we are in such inconsistent state where the remote publication is enabled without remote state being enabled. We should fall back to publication over transport call.
Additional Details
No response
Metadata
Assignees
Labels
Type
Projects
Status
✅ Done
Activity