Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt-api can't bear a lot of concurrency when using tornado backend? #55953

Open
liu624862560 opened this issue Jan 24, 2020 · 5 comments
Open
Labels
Question The issue is more of a question rather than a bug or a feature request Salt-API
Milestone

Comments

@liu624862560
Copy link

liu624862560 commented Jan 24, 2020

Description of Issue

here is question, we want use salt-api to send some shell command to minions, we use tornado_rest salt-api, we have one master and about 800 minions, while we use requests and multiprocessing/threading to make requests to salt-api, when we set about 100 minions and do test.ping function, we get about ten returners like [{}], and if we set about 400 minions, we get many [{}],we use request like this

res = requests.post(
            #url=salt_api_url,
            url=salt_api_url,
            data=json.dumps([{
                "client": "local",
                "tgt": target,
                "fun": "test.ping",
                "tgt_type": tgt_type,
                "timeout": 10
            }]),
            headers={
                "Accept": "application/json",
                "X-Auth-Token": token,
                "Content-Type": "application/json"
            },
            verify=False,
            timeout=300
        )

and we just want get results synchronously, don't want use async/returners/event,so is there something I don't know with config? can salt-api do as we want? and anyone has some ideas, please leave you message, thanks for all of you.

Setup

saltmaster config
default_include: master.d/*.conf
timeout: 20
worker_threads: 20
auto_accept: True
file_roots:
  base:
    - /datasalt/srv/salt/base
  dev:
    - /datasalt/srv/salt/dev
  test:
    - /datasalt/srv/salt/test
  prod:
    - /datasalt/srv/salt/prod
pillar_roots:
  base:
    - /datasalt/srv/pillar/base
  dev:
    - /datasalt/srv/pillar/dev
  test:
    - /datasalt/srv/pillar/test
  prod:
    - /datasalt/srv/pillar/prod
pillar_opts: True
log_level_logfile: debug

salt-api config

rest_tornado:
    port: 8001
    address: 0.0.0.0
    #backlog: 128
    ssl_crt: /etc/pki/tls/certs/saltcert.crt
    ssl_key: /etc/pki/tls/private/saltcert.key
    debug: True
    disable_ssl: False
    cors_origin: null
    webhook_url: /hook
    webhook_disable_auth: True
    #num_processes: 4

Steps to Reproduce Issue

some logs like this:

2020-01-24 14:24:12,337 [salt.transport.ipc:254 ][DEBUG   ][4850] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc
2020-01-24 14:24:12,340 [salt.transport.zeromq:1084][DEBUG   ][4850] SaltReqTimeoutError, retrying. (1/3)
2020-01-24 14:24:12,380 [salt.transport.zeromq:1084][DEBUG   ][4850] SaltReqTimeoutError, retrying. (1/3)
2020-01-24 14:24:12,411 [salt.transport.zeromq:138 ][DEBUG   ][4850] Re-using AsyncZeroMQReqChannel for (u'/etc/salt/pki/master', u'sy-centos7.3.1611-xxxxx_master', u'tcp://1
27.0.0.1:4506', u'clear')
2020-01-24 14:24:12,412 [salt.transport.ipc:254 ][DEBUG   ][4850] Initializing new IPCClient for path: /var/run/salt/master/master_event_pub.ipc

it looks like something wrong with SaltReqTimeoutError, we don't have much error like this with cherrypy_rest salt-api,cherrypy_rest is good but not fast we when have many minions to control at once

Versions Report

Salt Version:
Salt: 2018.3.3

Dependency Versions:
cffi: 1.6.0
cherrypy: unknown
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
Jinja2: 2.7.2
libgit2: 0.24.6
libnacl: Not Installed
M2Crypto: 0.21.1
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: 1.2.5
pycparser: 2.14
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: 0.24.2
Python: 2.7.5 (default, Aug 4 2017, 00:39:18)
python-gnupg: Not Installed
PyYAML: 3.10
PyZMQ: 15.3.0
RAET: Not Installed
smmap: Not Installed
timelib: Not Installed
Tornado: 4.2.1
ZMQ: 4.1.4

System Versions:
dist: centos 7.3.1611 Core
locale: UTF-8
machine: x86_64
release: 3.10.0-693.5.2.el7.x86_64
system: Linux
version: CentOS Linux 7.3.1611 Core

@liu624862560
Copy link
Author

another thing is about salt-master, we just do this with development environment,the salt-master of product environment is more cpus and ram than dev env.

@stale
Copy link

stale bot commented Feb 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Feb 27, 2020
@stale stale bot closed this as completed Mar 5, 2020
@sagetherage sagetherage reopened this Apr 7, 2020
@stale
Copy link

stale bot commented Apr 7, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Apr 7, 2020
@DmitryKuzmenko
Copy link
Contributor

@liu624862560 this sounds like your master is overloaded by a bunch of minion returns coming at once. There are a number of options in this case.

  • Check your master performance and the network bandwidth. 400 minions sounds not like a verdict.
  • Increase the timeout value and allowed retry count to make the system waiting more till the last one minion is responded. This is the simplest thing to do.
  • Use splay executor that spreads the execution in time to not overload master by minions, but it constantly increases the execution time.
  • Change your salt layout. You can use a number of Syndics to spread the main Master load. Syndics are collecting minions responses and sending them in a batch.

What way looks better for you?

I also want to hear someone else @saltstack/team-core.

@DmitryKuzmenko DmitryKuzmenko added Question The issue is more of a question rather than a bug or a feature request and removed needs-triage labels Apr 14, 2020
@DmitryKuzmenko
Copy link
Contributor

BTW, how it works if you're doing the same request not using salt-api?

@Akm0d Akm0d added Salt-API team-netapi v2018.3 unsupported version labels Apr 14, 2020
@Akm0d Akm0d added this to the Approved milestone Apr 14, 2020
@sagetherage sagetherage removed the v2018.3 unsupported version label Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question The issue is more of a question rather than a bug or a feature request Salt-API
Projects
None yet
Development

No branches or pull requests

4 participants