Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 3004.1 broke multi-master #62318

Open
nicholasmhughes opened this issue Jul 11, 2022 · 9 comments
Open

[BUG] 3004.1 broke multi-master #62318

nicholasmhughes opened this issue Jul 11, 2022 · 9 comments
Assignees
Labels
Feature new functionality including changes to functionality and code refactors, etc. Multi-Master severity-high 2nd top severity, seen by most users, causes major problems VMware

Comments

@nicholasmhughes
Copy link
Collaborator

Related to #61868, but opening another issue in the interest of being able to segment problems since syndic might be fixed as of 3004.2

Description
Starting with version 3004.1, minions in multi-master setups where the masters have distinct keys but the same signing key constantly throw this error in the logs:

2022-07-11 14:58:18,725 [salt.pillar      :267 ][ERROR   ][10018] Pillar payload signature failed to validate.
2022-07-11 14:58:18,725 [salt.minion      :1143][ERROR   ][10018] Error while bringing up minion for multi-master. Is master at d10-party-01 responding?

Setup

master01
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X -M git master

mkdir /etc/salt/master.d
cat << EOF > /etc/salt/master.d/multi.conf
auto_accept: True
master_sign_pubkey: True
master_sign_key_name: master_pubkey_signature
EOF

cat << EOF > /etc/salt/minion.d/multi.conf
master:
  - master01
  - master02
master_type: str
verify_master_pubkey_sign: True
EOF

systemctl start salt-master

cp /etc/salt/pki/master/master_pubkey_signature.pub /etc/salt/pki/minion/master_sign.pub

systemctl start salt-minion

echo -e "The stuff below is for copy/pasting into the other machines:\n"

echo "master_pubkey_signature.pem"
cat /etc/salt/pki/master/master_pubkey_signature.pem; echo

echo "master_pubkey_signature.pub"
cat /etc/salt/pki/master/master_pubkey_signature.pub; echo
master02
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X -M git master

mkdir /etc/salt/master.d
cat << EOF > /etc/salt/master.d/multi.conf
auto_accept: True
master_sign_pubkey: True
master_sign_key_name: master_pubkey_signature
EOF

cat << EOF > /etc/salt/minion.d/multi.conf
master:
  - master02
  - master01
master_type: str
verify_master_pubkey_sign: True
EOF

cat << EOF > /etc/salt/pki/master/master_pubkey_signature.pem
INSERT master_pubkey_signature.pem FROM ABOVE
EOF

cat << EOF > /etc/salt/pki/master/master_pubkey_signature.pub
INSERT master_pubkey_signature.pub FROM ABOVE
EOF

systemctl start salt-master

cp /etc/salt/pki/master/master_pubkey_signature.pub /etc/salt/pki/minion/master_sign.pub

systemctl start salt-minion
minion01
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X git master

cat << EOF > /etc/salt/minion.d/multi.conf
master:
  - master01
  - master02
master_type: str
verify_master_pubkey_sign: True
EOF

cat << EOF > /etc/salt/pki/minion/master_sign.pub
INSERT master_pubkey_signature.pub FROM ABOVE
EOF

systemctl start salt-minion

Steps to Reproduce the behavior
Just start tailing the minion logs and you should see the aforementioned errors.

Expected behavior
I would expect a multi-master setup to use the signing key (pub) for verification of inbound signatures instead of an individual master public key.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3005+0na.e3929c5
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.4
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.15.0
        pygit2: Not Installed
        Python: 3.7.3 (default, Jan 22 2021, 20:04:44)
  python-gnupg: Not Installed
        PyYAML: 6.0
         PyZMQ: 23.2.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: debian 10 buster
        locale: UTF-8
       machine: x86_64
       release: 4.19.0-18-amd64
        system: Linux
       version: Debian GNU/Linux 10 buster

Additional context
The versions report provided is for the current pre-3005 master as of issue entry, but can be reproduced on 3004.1 and 3004.2

@nicholasmhughes nicholasmhughes added Bug broken, incorrect, or confusing behavior Regression The issue is a bug that breaks functionality known to work in previous releases. severity-high 2nd top severity, seen by most users, causes major problems Multi-Master needs-triage labels Jul 11, 2022
@Ch3LL
Copy link
Contributor

Ch3LL commented Jul 14, 2022

@dwoz any ideas here?

@Sircular
Copy link

We're also seeing this in a multi-master setup, even when not running a minion service (in this case we're running modules using salt-call).

@devonberta
Copy link

devonberta commented Oct 10, 2022

I just went through this issue after upgrading 20k+ minions to 3003.5. The core issue came down to how the master.key and master.pub are configured on your masters participating in multi master setups. If you are in an active active multi master configuration you MUST have the same master.key and master.pub keys on every master that is in the minions list regardless of if signature checks are enabled or not. This is in line with what is stated in the multi master tutorial and was not being properly validate until 3003.4 when they added checks to remediate signing attacks.

I believe the exact change that was made is related to signing behavior with the "minion_master.pub" which is cached when a minion connects to a master. This file is being used to validate signatures and is always the "master.pub" file of the first master the minion is currently connected to. So this issue results when the signature used to sign requests coming from another master is not signing with the matching "master.pem" of the first master. Hence the requirement for both masters in an active active multi master deployment to use the same key. I have asked vmware support and salt resources at vmware to make note of this and provide clarification as the release notes from version 3003.4 and 3003.5 or 3004.1 and 3004.2 do not accurately provided insight to the impact of the security fixes that were made.

Reference configuration:
https://docs.saltproject.io/en/latest/topics/tutorials/multimaster.html#prepping-a-redundant-master

You can use the signature check options with this configuration but they do not give that example in the tutorial. You can find examples of the signature check configuration in the failover tutorial on saltstacks site. We ran into this issue because we migrated from the active passive (failover) configuration to multi master (active active) setup and everything was working until version 3003.4. After exploring the issue and reproducing it in a lab I found the issue occurs because of the key mismatch. Now minions are in fact connecting to both masters and do respond to module commands. They do however fail to respond to state applies whether executed remotely or locally depending if the initial master they cached the "master.pub" key of is available to them.

Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions. We found stopping the first master after the second had come online with the changes and then starting the first master again was enough to force minions to reestablish their connections and allowed the minions to start behaving without any intervention on the minion side. Worst case scenario we stopped the service on both masters so that minions started to attempt retries and then started them again. Hope this helps others out there. If I find the bug that the code changes were made on Ill link it over when I have a minute.

@antiphase
Copy link

Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions.

https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html explicitly contradicts using the same key pair on all masters in its final sentence, unless I misunderstand what master.pub means in this context.

What is actually the correct procedure for operating multiple active masters, or is there no viable way of doing that at present as a result of this bug?

@devonberta
Copy link

Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions.

https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html explicitly contradicts using the same key pair on all masters in its final sentence, unless I misunderstand what master.pub means in this context.

What is actually the correct procedure for operating multiple active masters, or is there no viable way of doing that at present as a result of this bug?

Yes for failover multi master which is what you linked. Active active multi master is linked above in my post and explicitly requires the master.pem and pub be identical.

@Ch3LL Ch3LL added this to the Sulphur v3006.0 milestone Oct 13, 2022
@Sircular
Copy link

We already have all the masters using the same keys in their configuration, and the public key for the master matches the public key configured in all the minions. We are still seeing this output.

@devonberta
Copy link

We already have all the masters using the same keys in their configuration, and the public key for the master matches the public key configured in all the minions. We are still seeing this output.

Can you post your salt master config so I can review.

@devonberta
Copy link

And a minion config example also.

@barbaricyawps
Copy link
Contributor

I talked with @doesitblend to get some more clarity on the action requested for this ticket. The engineering ask is to make it possible to use key signing with different key pairs on the masters. There was a related docs request, which Ken opened and linked as related to this ticket: #63094

@dwoz dwoz added Feature new functionality including changes to functionality and code refactors, etc. and removed Regression The issue is a bug that breaks functionality known to work in previous releases. needs-triage labels Mar 6, 2023
This was referenced Mar 6, 2023
@Ch3LL Ch3LL modified the milestones: Sulfur v3006.0, Chlorine v3007.0 Mar 7, 2023
@barbaricyawps barbaricyawps removed their assignment May 23, 2023
@anilsil anilsil removed the Bug broken, incorrect, or confusing behavior label May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature new functionality including changes to functionality and code refactors, etc. Multi-Master severity-high 2nd top severity, seen by most users, causes major problems VMware
Projects
None yet
Development

No branches or pull requests

9 participants