Skip to content

A large number of duplicated update mapping tasks take down master node. #66768

Closed
@howardhuanghua

Description

@howardhuanghua

Issue

Elasticsearch version (bin/elasticsearch --version): 7.5.1

Plugins installed: []

JVM version (java -version): 1.8.0.181

OS version (uname -a if on a Unix-like system): CentOS Linux 3.1.0

ES supports auto create index / update mapping during handling bulk request. We have created one index with 100 primary shards, and cluster has high concurrency bulk requests(20k+/s) to the target index at the same time, and they all have added same new fields (would concurrently trigger mapping update). Then huge duplicated update mapping tasks would make master hang for long time (cpu almost 100% consumed). Then other nodes disconnected from current master.

Related logs, you could see that lot's of duplicated _doc types of mapping update task has been calculated. And also they need to call lot's of listeners.

[2020-12-18T09:52:42,632][WARN ][o.e.c.s.MasterService    ] [1608186356000606732] took [20.2s], which is over [10s], to compute cluster state update for [put-mapping[_doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc, _doc...............


[2020-12-18T10:19:08,583][WARN ][o.e.c.s.ClusterApplierService] [1608186356000606732] cluster state applier task [Publication{term=99, version=706636}] took [9.6m] which is above the warn threshold of [30s]: [running task [Publication{term=99, version=706636}]] took [0ms], [connecting to new nodes] took [0ms], [applying settings] took [0ms], [running applier [org.elasticsearch.indices.cluster.IndicesClusterStateService@6a7b290d]] took [1ms], [running applier [org.elasticsearch.script.ScriptService@5030671a]] took [0ms], [running applier [org.elasticsearch.xpack.ilm.IndexLifecycleService@3f48459b]] took [10ms], [running applier [org.elasticsearch.repositories.RepositoriesService@6d1eb292]] took [0ms], [running applier [org.elasticsearch.snapshots.RestoreService@5e9ecd7d]] took [0ms], [running applier [org.elasticsearch.ingest.IngestService@d456903]] took [0ms], [running applier [org.elasticsearch.action.ingest.IngestActionForwarder@70dc91ad]] took [0ms], [running applier [org.elasticsearch.action.admin.cluster.repositories.cleanup.TransportCleanupRepositoryAction$$Lambda$3326/1448648448@68ce2706]] took [0ms], [running applier [org.elasticsearch.tasks.TaskManager@4013fa47]] took [0ms], [running applier [org.elasticsearch.snapshots.SnapshotsService@66388e37]] took [0ms], [notifying listener [org.elasticsearch.cluster.InternalClusterInfoService@348ccb0d]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.support.SecurityIndexManager@1667a29d]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.support.SecurityIndexManager@dd482ff]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.authc.TokenService$$Lambda$2005/551229951@70943baa]] took [0ms], [notifying listener [org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$$Lambda$2096/960510275@22924216]] took [0ms], [notifying listener [org.elasticsearch.xpack.watcher.support.WatcherIndexTemplateRegistry@9ca6f93]] took [2ms], [notifying listener [org.elasticsearch.xpack.watcher.WatcherLifeCycleService@2020c72f]] took [0ms], [notifying listener [org.elasticsearch.xpack.watcher.WatcherIndexingListener@71b8f1ea]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.job.process.autodetect.AutodetectProcessManager@410a8f39]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.datafeed.DatafeedManager$TaskRunner@5eece0c1]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.MlAssignmentNotifier@30d1516b]] took [3ms], [notifying listener [org.elasticsearch.xpack.ml.MlInitializationService@3a7e90a0]] took [2ms], [notifying listener [org.elasticsearch.xpack.ilm.IndexLifecycleService@3f48459b]] took [0ms], [notifying listener [org.elasticsearch.xpack.core.slm.history.SnapshotLifecycleTemplateRegistry@2a70f3a8]] took [0ms], [notifying listener [org.elasticsearch.xpack.slm.SnapshotLifecycleService@1323e4cf]] took [0ms], [notifying listener [org.elasticsearch.xpack.ccr.action.ShardFollowTaskCleaner@2943b5c9]] took [0ms], [notifying listener [org.elasticsearch.xpack.transform.TransformClusterStateListener@67f6dbd1]] took [0ms], [notifying listener [org.elasticsearch.plugin.opack.OPackActionFilter$$Lambda$2347/1715343684@4a368934]] took [2ms], [notifying listener [org.elasticsearch.cluster.metadata.TemplateUpgradeService@463f1921]] took [3ms], [notifying listener [org.elasticsearch.node.ResponseCollectorService@72e601d2]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.action.TransportOpenJobAction$OpenJobPersistentTasksExecutor$$Lambda$2844/814630491@231f740c]] took [0ms], [notifying listener [org.elasticsearch.xpack.ml.action.TransportStartDataFrameAnalyticsAction$TaskExecutor$$Lambda$2848/598978484@5d9e3672]] took [0ms], [notifying listener [org.elasticsearch.persistent.PersistentTasksClusterService@63d6daf1]] took [0ms], [notifying listener [org.elasticsearch.cluster.routing.DelayedAllocationService@217cbfc8]] took [2ms], [notifying listener [org.elasticsearch.gateway.DanglingIndicesState@799b00ff]] took [0ms], [notifying listener [org.elasticsearch.persistent.PersistentTasksNodeService@551a9418]] took [0ms], [notifying listener [org.elasticsearch.license.LicenseService@4ecd18]] took [0ms], [notifying listener [org.elasticsearch.xpack.ccr.action.AutoFollowCoordinator@75da727a]] took [0ms], [notifying listener [org.elasticsearch.gateway.GatewayService@26ce242c]] took [0ms], [notifying listener [org.elasticsearch.cluster.service.ClusterApplierService$LocalNodeMasterListeners@5ae44db4]] took [1ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1e801e81]] took [35ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1fc01fc1]] took [31ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@2a002a03]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@1a201a25]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@6fc06fc5]] took [32ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@70b070bb]] took [30ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@72e072f4]] took [29ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@2ec02ed5]] took [28ms], [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@77607776]] took [29ms], [notifying listener............


Steps to reproduce:
I have veirfied the duplicated update mapping tasks in my local cluster.

  1. Create cluster with 2 nodes.
  2. Create index with 2 primary shards, no replica.
  3. Use three clients to send a bulk request concurrently, each bulk request only contains single write document, and they have the same new field.
Client request1:  // field1 is the new field
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
'
Client request2: 
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "2" } }
{ "field1" : "value2" }
'
Client request3: 
curl -X POST "/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
'

I could see three update mapping tasks for type of _doc on master node:

[2020-12-22T00:35:03,553][DEBUG][o.e.c.s.MasterService    ] [node-1] executing cluster state update for [put-mapping[_doc, _doc, _doc]]

Proposal

Each shard bulk operation on primary shard would submit a mapping update task if the mapping has been changed:

performOnPrimary(request, primary, updateHelper, threadPool::absoluteTimeInMillis,
(update, shardId, mappingListener) -> {
assert update != null;
assert shardId != null;
mappingUpdatedAction.updateMappingOnMaster(shardId.getIndex(), update, mappingListener);
},

If one index on single data node has submitted update mapping task to master, we could pending incoming mapping update task that has the same mapping update source and put them in a cache queue instead of sending duplicate task to master directly. Once the submitted update mapping task has been processed, simple to set the rest incoming task's status and continue to process bulk write. With this optimization, we could group update mapping tasks to node index level instead of shard bulk operation level.

If this idea is acceptable, I would provide a PR, thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed Indexing/DistributedA catch all label for anything in the Distributed Indexing Area. Please avoid if you can.>bugTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.needs:triageRequires assignment of a team area label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions