salt-api can't bear a lot of concurrency when using tornado backend? #55953

liu624862560 · 2020-01-24T06:51:18Z

Description of Issue

here is question, we want use salt-api to send some shell command to minions, we use tornado_rest salt-api, we have one master and about 800 minions, while we use requests and multiprocessing/threading to make requests to salt-api, when we set about 100 minions and do test.ping function, we get about ten returners like [{}], and if we set about 400 minions, we get many [{}],we use request like this

res = requests.post(
            #url=salt_api_url,
            url=salt_api_url,
            data=json.dumps([{
                "client": "local",
                "tgt": target,
                "fun": "test.ping",
                "tgt_type": tgt_type,
                "timeout": 10
            }]),
            headers={
                "Accept": "application/json",
                "X-Auth-Token": token,
                "Content-Type": "application/json"
            },
            verify=False,
            timeout=300
        )

and we just want get results synchronously, don't want use async/returners/event,so is there something I don't know with config? can salt-api do as we want? and anyone has some ideas, please leave you message, thanks for all of you.

Setup

saltmaster config
default_include: master.d/*.conf
timeout: 20
worker_threads: 20
auto_accept: True
file_roots:
  base:
    - /datasalt/srv/salt/base
  dev:
    - /datasalt/srv/salt/dev
  test:
    - /datasalt/srv/salt/test
  prod:
    - /datasalt/srv/salt/prod
pillar_roots:
  base:
    - /datasalt/srv/pillar/base
  dev:
    - /datasalt/srv/pillar/dev
  test:
    - /datasalt/srv/pillar/test
  prod:
    - /datasalt/srv/pillar/prod
pillar_opts: True
log_level_logfile: debug

salt-api config

rest_tornado:
    port: 8001
    address: 0.0.0.0
    #backlog: 128
    ssl_crt: /etc/pki/tls/certs/saltcert.crt
    ssl_key: /etc/pki/tls/private/saltcert.key
    debug: True
    disable_ssl: False
    cors_origin: null
    webhook_url: /hook
    webhook_disable_auth: True
    #num_processes: 4

Steps to Reproduce Issue

some logs like this:

2020-01-24 14:24:12,337 [salt.transport.ipc:254 ][DEBUG   ][4850] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
2020-01-24 14:24:12,340 [salt.transport.zeromq:1084][DEBUG   ][4850] SaltReqTimeoutError, retrying. (1/3)
2020-01-24 14:24:12,380 [salt.transport.zeromq:1084][DEBUG   ][4850] SaltReqTimeoutError, retrying. (1/3)
2020-01-24 14:24:12,411 [salt.transport.zeromq:138 ][DEBUG   ][4850] Re-using AsyncZeroMQReqChannel for (u'/etc/salt/pki/master', u'sy-centos7.3.1611-xxxxx_master', u'tcp://1
27.0.0.1:4506', u'clear')
2020-01-24 14:24:12,412 [salt.transport.ipc:254 ][DEBUG   ][4850] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc

it looks like something wrong with SaltReqTimeoutError, we don't have much error like this with cherrypy_rest salt-api，cherrypy_rest is good but not fast we when have many minions to control at once

Versions Report

Salt Version:
Salt: 2018.3.3

Dependency Versions:
cffi: 1.6.0
cherrypy: unknown
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: 0.24.6
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: 1.2.5
pycparser: 2.14
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: 0.24.2
Python: 2.7.5 (default, Aug 4 2017, 00:39:18)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: centos 7.3.1611 Core
locale: UTF-8
machine: x86_64
release: 3.10.0-693.5.2.el7.x86_64
system: Linux
version: CentOS Linux 7.3.1611 Core

The text was updated successfully, but these errors were encountered:

liu624862560 · 2020-01-24T07:00:09Z

another thing is about salt-master, we just do this with development environment，the salt-master of product environment is more cpus and ram than dev env.

stale · 2020-02-27T22:28:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

stale · 2020-04-07T16:12:06Z

Thank you for updating this issue. It is no longer marked as stale.

DmitryKuzmenko · 2020-04-14T15:26:37Z

@liu624862560 this sounds like your master is overloaded by a bunch of minion returns coming at once. There are a number of options in this case.

Check your master performance and the network bandwidth. 400 minions sounds not like a verdict.
Increase the timeout value and allowed retry count to make the system waiting more till the last one minion is responded. This is the simplest thing to do.
Use splay executor that spreads the execution in time to not overload master by minions, but it constantly increases the execution time.
Change your salt layout. You can use a number of Syndics to spread the main Master load. Syndics are collecting minions responses and sending them in a batch.

What way looks better for you?

I also want to hear someone else @saltstack/team-core.

DmitryKuzmenko · 2020-04-14T15:43:31Z

BTW, how it works if you're doing the same request not using salt-api?

sagetherage added the needs-triage label Jan 28, 2020

sagetherage assigned Akm0d Jan 28, 2020

stale bot added the stale label Feb 27, 2020

stale bot closed this as completed Mar 5, 2020

sagetherage reopened this Apr 7, 2020

stale bot removed the stale label Apr 7, 2020

sagetherage unassigned Akm0d Apr 10, 2020

DmitryKuzmenko added Question The issue is more of a question rather than a bug or a feature request and removed needs-triage labels Apr 14, 2020

Akm0d added Salt-API team-netapi v2018.3 unsupported version labels Apr 14, 2020

Akm0d added this to the Approved milestone Apr 14, 2020

sagetherage removed the v2018.3 unsupported version label Apr 24, 2020

sagetherage removed the team-netapi label May 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

salt-api can't bear a lot of concurrency when using tornado backend? #55953

salt-api can't bear a lot of concurrency when using tornado backend? #55953

liu624862560 commented Jan 24, 2020 •

edited by garethgreenaway

Loading

liu624862560 commented Jan 24, 2020

stale bot commented Feb 27, 2020

stale bot commented Apr 7, 2020

DmitryKuzmenko commented Apr 14, 2020

DmitryKuzmenko commented Apr 14, 2020

salt-api can't bear a lot of concurrency when using tornado backend? #55953

salt-api can't bear a lot of concurrency when using tornado backend? #55953

Comments

liu624862560 commented Jan 24, 2020 • edited by garethgreenaway Loading

Description of Issue

Setup

Steps to Reproduce Issue

Versions Report

liu624862560 commented Jan 24, 2020

stale bot commented Feb 27, 2020

stale bot commented Apr 7, 2020

DmitryKuzmenko commented Apr 14, 2020

DmitryKuzmenko commented Apr 14, 2020

liu624862560 commented Jan 24, 2020 •

edited by garethgreenaway

Loading