Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Master startup performance using consul cache #58325

Open
amuhametov opened this issue Aug 30, 2020 · 1 comment
Open

[BUG] Master startup performance using consul cache #58325

amuhametov opened this issue Aug 30, 2020 · 1 comment
Labels
Bug broken, incorrect, or confusing behavior Performance Salt-Syndic severity-high 2nd top severity, seen by most users, causes major problems
Milestone

Comments

@amuhametov
Copy link

Description
Master workers load cache too much during startup.
Each worker generates at least one request per minion to consul.

Restarting all/few masters at a time makes consul unresponsive for an hour(s).

MemCache does not help at all

Setup

  1. 10k+ minions
  2. 13 master hosts
  3. more than 100 workers per syndic host

Master config:

zmq_backlog : 8192
consul.dc : dc1
master_sign_pubkey : False
consul.consistency : stale
state_output : terse
log_level : debug
consul.port : 8500
cache : consul
con_cache : False
ipv6 : False
master_id : master
order_masters : True
event_publisher_pub_hwm : 64000
consul.host : 127.0.0.1
syndic_wait : 30
worker_threads : 144
pub_hwm : 8192
user : salt
state_verbose : False
sock_pool_size : 4096
consul.token : xxx
zmq_filtering : False
keep_jobs : 4
consul.verify : True
salt_event_pub_hwm : 128000
max_event_size : 1572864
consul.scheme : http
memcache_expire_seconds: 300
memcache_max_items: 1000
memcache_debug: True

Steps to Reproduce the behavior
restarted master at 17:20

# grep 'GET /v1/kv/minions/minion1/mine'  /var/log/salt/master|grep -c '2020-08-30 17:2' 
117
# grep 'GET /v1/kv/minions/minion1/mine'  /var/log/salt/master|grep -c '2020-08-30 17:3' 
50
# grep 'GET /v1/kv/minions/minion1/mine'  /var/log/salt/master|grep -c '2020-08-30 17:4' 
4

# grep MemCach /var/log/salt/master
2020-08-30 17:21:27,815 [salt.cache       :329 ][DEBUG   ][8762] MemCache stats (call/hit/rate): 2/1/0.5
2020-08-30 17:35:09,099 [salt.cache       :329 ][DEBUG   ][8552] MemCache stats (call/hit/rate): 21893/1/4.56767003152e-05
2020-08-30 17:37:44,711 [salt.cache       :329 ][DEBUG   ][8550] MemCache stats (call/hit/rate): 32841/1/3.04497426997e-05
2020-08-30 17:37:44,711 [salt.cache       :329 ][DEBUG   ][8550] MemCache stats (call/hit/rate): 32842/2/6.08976310822e-05
2020-08-30 17:37:44,712 [salt.cache       :329 ][DEBUG   ][8550] MemCache stats (call/hit/rate): 32843/3/9.13436653168e-05
2020-08-30 17:37:44,712 [salt.cache       :329 ][DEBUG   ][8550] MemCache stats (call/hit/rate): 32844/4/0.000121787845573
2020-08-30 17:37:44,712 [salt.cache       :329 ][DEBUG   ][8550] MemCache stats (call/hit/rate): 32845/5/0.00015223017202
2020-08-30 17:39:25,837 [salt.cache       :329 ][DEBUG   ][8529] MemCache stats (call/hit/rate): 21900/1/4.56621004566e-05

Expected behavior
There should be a separate process to query consul only once.

Versions Report

``` Salt Version: Salt: 2017.7.8

Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: Not Installed
libnacl: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.5.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.5 (default, Oct 30 2018, 23:45:53)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 16.0.4
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.2.5

System Versions:
dist: centos 7.8.2003 Core
locale: UTF-8
machine: x86_64
release: 3.10.0-1127.19.1.el7.x86_64
system: Linux
version: CentOS Linux 7.8.2003 Core

</details>
@amuhametov amuhametov added the Bug broken, incorrect, or confusing behavior label Aug 30, 2020
@cmcmarrow cmcmarrow added this to the Approved milestone Sep 11, 2020
@cmcmarrow
Copy link
Contributor

@amuhametov thanks for the bug report and sorry for the delayed response. Unfortunately I do not have 10k minions so I cant confirm this. But this a programmatic performance issue and should be worked on.

@sagetherage sagetherage added info-needed waiting for more info Performance Salt-Syndic Silicon v3004.0 Release code name and removed info-needed waiting for more info labels Jun 15, 2021
@sagetherage sagetherage modified the milestones: Approved, Silicon Jun 16, 2021
@sagetherage sagetherage removed the Silicon v3004.0 Release code name label Aug 19, 2021
@sagetherage sagetherage modified the milestones: Silicon, Approved Aug 19, 2021
@sagetherage sagetherage added the severity-high 2nd top severity, seen by most users, causes major problems label Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Performance Salt-Syndic severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

No branches or pull requests

3 participants