Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt-minion not thread safe [BUG] #61830

Open
4 tasks
tsaridas opened this issue Mar 22, 2022 · 5 comments
Open
4 tasks

salt-minion not thread safe [BUG] #61830

tsaridas opened this issue Mar 22, 2022 · 5 comments
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE severity-high 2nd top severity, seen by most users, causes major problems

Comments

@tsaridas
Copy link
Contributor

tsaridas commented Mar 22, 2022

Description
When configuration option multiprocessing is set to False the minion may return an error code which is wrong when multiple commands are being ran. I assume this is because __context__ is being overwritten by another running command.

The reason multiprocessing is useful in windows is due to the fact that the agent tries to authenticate every time it returns the result.

Setup
configure minion with multiprocessing: False

Please be as specific as possible and give set-up details.

  • [ x ] on-prem machine
  • [ x ] VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD

Steps to Reproduce the behavior
Run the below commands multiple times on a salt master (10 times (5-5) should be enough to reproduce quickly)

# The idea here is to send a command that will fail 100% and watch for the return code to be positive 
while true; do if salt -L ThreadedMinion cmd.run 'dir blah'; then echo "Reproduced"; break; fi; sleep 0.$RANDOM; done
# The idea here is to send a command that will always succeed and watch if the return code is negative
while true; do ! if salt -L ThreadedMinion cmd.run 'echo blah'; then echo "Reproduced"; break; fi; sleep 0.$RANDOM; done

The return of the looks should look like this when it fails

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.
ERROR: Minions returned with non-zero exit code

[ThreadedMinion](ThreadedMinion):
    The system cannot find the file specified.

Reproduced
[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

[ThreadedMinion](ThreadedMinion):
    abc

ERROR: Minions returned with non-zero exit code

Reproduced

The minion log looks normal and has the actual results of the commands being sent.

2022-03-22 17:20:56,902 [salt.minion      :2165][INFO    ][11840] Returning information for job: 20220322172056719786
2022-03-22 17:20:58,355 [salt.minion      :1686][INFO    ][11840] User root Executing command cmd.run with jid 20220322172058317569
2022-03-22 17:20:58,448 [salt.minion      :1882][INFO    ][11840] Starting a new job 20220322172058317569 with PID 11840
2022-03-22 17:20:58,466 [salt.loaded.int.module.cmdmod:415 ][INFO    ][11840] Executing command 'echo' in directory 'C:\Windows\system32\config\systemprofile'
2022-03-22 17:20:58,488 [salt.minion      :2165][INFO    ][11840] Returning information for job: 20220322172058317569
2022-03-22 17:21:00,632 [salt.minion      :1686][INFO    ][11840] User root Executing command cmd.run with jid 20220322172100597047
2022-03-22 17:21:00,742 [salt.minion      :1882][INFO    ][11840] Starting a new job 20220322172100597047 with PID 11840
2022-03-22 17:21:00,765 [salt.loaded.int.module.cmdmod:415 ][INFO    ][11840] Executing command 'dir' in directory 'C:\Windows\system32\config\systemprofile'
2022-03-22 17:21:00,794 [salt.loaded.int.module.cmdmod:856 ][ERROR   ][11840] Command 'dir' failed with return code: 2
2022-03-22 17:21:00,794 [salt.loaded.int.module.cmdmod:858 ][ERROR   ][11840] stdout: The system cannot find the file specified.
2022-03-22 17:21:00,794 [salt.loaded.int.module.cmdmod:862 ][ERROR   ][11840] retcode: 2
2022-03-22 17:21:00,795 [salt.loaded.int.module.cmdmod:1253][ERROR   ][11840] Command 'dir' failed with return code: 2
2022-03-22 17:21:00,795 [salt.loaded.int.module.cmdmod:1258][ERROR   ][11840] output: The system cannot find the file specified.

Expected behavior
The minion should return the same error code as the command that was ran.

Screenshots

Versions Report

Salt Version:
          Salt: 3003.3.1
Dependency Versions:
          cffi: 1.14.5
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: 4.0.5
     gitpython: 3.1.13
        Jinja2: 2.11.3
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: 1.1.4
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
        Python: 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
  python-gnupg: 0.4.6
        PyYAML: 5.4.1
         PyZMQ: 18.0.1
         smmap: 3.0.4
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.1
System Versions:
          dist:
        locale: cp1250
       machine: AMD64
       release: 2019Server
        system: Windows
       version: 2019Server

highest version tested is 3003.3
lowest version tested is 3000.9

Additional context
I believe the multiprocessing config option should be removed until this issue is fixed.

@tsaridas tsaridas added Bug broken, incorrect, or confusing behavior needs-triage labels Mar 22, 2022
@OrangeDog
Copy link
Contributor

Can you provide a full reproduction and version report as requested by the issue template?

@OrangeDog OrangeDog added the info-needed waiting for more info label Mar 22, 2022
@tsaridas
Copy link
Contributor Author

Can you provide a full reproduction and version report as requested by the issue template?

updated although not necessary imo.

@OrangeDog
Copy link
Contributor

What is the error it returns? Is there some logging from the minion?

@tsaridas
Copy link
Contributor Author

tsaridas commented Mar 22, 2022

What is the error it returns? Is there some logging from the minion?

There is no errors (exceptions) on the minion logs as far as I can see except the command that failed and returns an error code of 2.

I believe you might not have understood the issue here.

The command that is supposed to fail returns as successful (or via versa) on the salt master and its probably because the __context__ on the minion is being overwritten by one of the commands that was successful.

Please try to understand what the two while loops are doing and try to run then multiple times ( different windows ) on the same master.

@dwoz dwoz removed the needs-triage label Mar 22, 2022
@dwoz dwoz self-assigned this Mar 22, 2022
@waynew
Copy link
Contributor

waynew commented Mar 22, 2022

So... I just spent a bunch of time digging into this on today's Test Clinic and I have way more questions than answers 😅

The answers that I have:

  • the problem exhibits here:

    salt/salt/minion.py

    Lines 1964 to 1966 in 7382656

    retcode = minion_instance.functions.pack["__context__"].get(
    "retcode", salt.defaults.exitcodes.EX_OK
    )
    - for some reason, the minion_instance.functions.pack["__context__"] is missing. Well, everything in it is missing. Typically it will contain {"retcode": 2}.

  • this problem can be pretty easily duplicated:

    echo "multiprocessing: False" >> /etc/salt/minion
    salt-minion -d
    for i in {1..20}; do salt saltdev cmd.run "dir blah-$(date +%s)" 2>/dev/null >/dev/null && echo "oops" & done
    

    (may need to tweak 20 up or it might exhibit at lower numbers, depending on your system)

The questions I have:

There are probably some other ones that I've forgotten by now, but... yeah something is quite strange here 😬

@waynew waynew added this to the Sulphur v3006.0 milestone Mar 22, 2022
@waynew waynew added Confirmed Salt engineer has confirmed bug/feature - often including a MCVE and removed info-needed waiting for more info labels Mar 22, 2022
@OrangeDog OrangeDog added the severity-high 2nd top severity, seen by most users, causes major problems label Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

No branches or pull requests

5 participants