Description
Investigating https://discuss.elastic.co/t/index-creation-slows-down-over-time/230775 I wrote a script that creates and deletes indices repeatedly and left it running for several hours against a 7.6.2 cluster. Each iteration had indeed slowed down as reported, from around 6s
to around 10s
.
The hot threads API often reported a substantial amount of time spent doing things with loggers:
20.3% (101.5ms out of 500ms) cpu usage by thread 'elasticsearch[node-0][clusterApplierService#updateTask][T#1]'
2/10 snapshots sharing following 25 elements
java.base@14/java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1048)
app//org.elasticsearch.common.logging.Loggers.setLevel(Loggers.java:127)
app//org.elasticsearch.common.logging.Loggers.setLevel(Loggers.java:111)
app//org.elasticsearch.index.SearchSlowLog.setLevel(SearchSlowLog.java:127)
app//org.elasticsearch.index.SearchSlowLog.<init>(SearchSlowLog.java:123)
app//org.elasticsearch.index.IndexModule.<init>(IndexModule.java:151)
app//org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:595)
app//org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:542)
app//org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:173)
app//org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:484)
app//org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:246)
app//org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517)
app//org.elasticsearch.cluster.service.ClusterApplierService$$Lambda$4041/0x000000080175d440.accept(Unknown Source)
java.base@14/java.lang.Iterable.forEach(Iterable.java:75)
app//org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514)
app//org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485)
app//org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
app//org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73)
app//org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633)
app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
java.base@14/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.base@14/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@14/java.lang.Thread.run(Thread.java:832)
2/10 snapshots sharing following 24 elements
app//org.elasticsearch.common.logging.Loggers.setLevel(Loggers.java:127)
app//org.elasticsearch.common.logging.Loggers.setLevel(Loggers.java:111)
app//org.elasticsearch.index.SearchSlowLog.setLevel(SearchSlowLog.java:128)
app//org.elasticsearch.index.SearchSlowLog.<init>(SearchSlowLog.java:123)
app//org.elasticsearch.index.IndexModule.<init>(IndexModule.java:151)
app//org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:595)
app//org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:542)
app//org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:173)
app//org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:484)
app//org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:246)
app//org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517)
app//org.elasticsearch.cluster.service.ClusterApplierService$$Lambda$4041/0x000000080175d440.accept(Unknown Source)
java.base@14/java.lang.Iterable.forEach(Iterable.java:75)
app//org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514)
app//org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485)
app//org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
app//org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73)
app//org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633)
app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
java.base@14/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.base@14/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@14/java.lang.Thread.run(Thread.java:832)
3/10 snapshots sharing following 14 elements
app//org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517)
app//org.elasticsearch.cluster.service.ClusterApplierService$$Lambda$4041/0x000000080175d440.accept(Unknown Source)
java.base@14/java.lang.Iterable.forEach(Iterable.java:75)
app//org.elasticsearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:514)
app//org.elasticsearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:485)
app//org.elasticsearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:432)
app//org.elasticsearch.cluster.service.ClusterApplierService.access$100(ClusterApplierService.java:73)
app//org.elasticsearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:176)
app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633)
app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
app//org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
java.base@14/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
java.base@14/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@14/java.lang.Thread.run(Thread.java:832)
3/10 snapshots sharing following 10 elements
java.base@14/jdk.internal.misc.Unsafe.park(Native Method)
java.base@14/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
java.base@14/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505)
java.base@14/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137)
java.base@14/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614)
java.base@14/java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:548)
java.base@14/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1056)
java.base@14/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1116)
java.base@14/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
java.base@14/java.lang.Thread.run(Thread.java:832)
I took a heap dump and observed almost 150,000 loggers of the forms index.indexing.slowlog.<INDEX_UUID>
, index.search.slowlog.fetch.<INDEX_UUID>
and index.search.slowlog.query.<INDEX_UUID>
(equal parts of each). There were only 1280 other instances of org.apache.logging.log4j.core.Logger
.
I suspect this is a consequence of #47234 which introduced per-index loggers.