Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIRE Server should enforce agent IDs reside in the agent namespace #2712

Open
azdagron opened this issue Jan 31, 2022 · 7 comments
Open

SPIRE Server should enforce agent IDs reside in the agent namespace #2712

azdagron opened this issue Jan 31, 2022 · 7 comments
Labels
help wanted Issues with this label are ready to start work but are in need of someone to do it priority/backlog Issue is approved and in the backlog

Comments

@azdagron
Copy link
Member

SPIRE has assumed that node attestors would produce agent IDs that conform to the following convention:

spiffe://<trustdomain>/spire/agent/<nodeattestor>/<unique per-agent suffix>

(e.g. spiffe://example.org/spire/agent/join_token/21B6D625-CCF3-49E1-8E7C-812B3F55B3CB)

Although this convention is not required for agent authorization to take place safely, it does aid in debuggability, auditability, and the ability to prevent misconfiguration (e.g. cannot accidentally assign a workload an agent ID since we can validate that the requested ID is not reserved). Being able to answer the question "was this ID produced by a node attestor" is also important for at-a-glance understanding of logs.

Unfortunately, SPIRE does not enforce this convention currently, and cannot due to backcompat concerns. However, a deprecation warning is introduced with #2694. This should give us the ability to enforce this beginning with SPIRE 1.4 at the earliest (since 1.3 will be the first minor to include the deprecation warning).

This issue tracks the enforcement of the agent ID convention.

@azdagron azdagron added this to the 1.4.0 milestone Jan 31, 2022
@boyanguber
Copy link
Contributor

Thank you for making this Andrew!

Our team is currently planning out the process of migrating our customers to use the new Agent ID path. Just wanted some clarifications on a few things if possible:

  • When this becomes enforced in 1.4, does that mean that old agent ID's that don't conform to the new convention will not be able to attest/sync with spire server anymore? Or is it only applied to newly created agent ID paths?
  • Will there be any auto-magic for nodes re-attesting under old formats to be auto-migrated to their new format?
  • Do you see any implementation specifics that may be of concern to us?

Thanks again!

@azdagron
Copy link
Member Author

azdagron commented Feb 7, 2022

Great questions @boyanguber!

When this becomes enforced in 1.4, does that mean that old agent ID's that don't conform to the new convention will not be able to attest/sync with spire server anymore? Or is it only applied to newly created agent ID paths?

I think all we plan to do as part of that release is ensure that newly attested nodes conform. We haven't discussed a timeline to enforce this for existing agents.

Will there be any auto-magic for nodes re-attesting under old formats to be auto-migrated to their new format?

This gets tricky. If we do decide to enforce this for existing agents, I can think of two mechanisms:

  1. Force operators to evicts and re-attest all agents. This would be operationally cumbersome, for sure.
  2. Allow agents to exchange an SVID with an old SPIFFE ID for a new one (at renewal time). This would require some sort of plugin integration that we currently don't have to allow the plugins to hook the renewal process.

This is further complicated by plugins needing to be careful that they do TOFU checks on the old ID shape until agents have migrated. This complication exists independent of the choice to enforce the ID shape on existing agents.

I need to think about this a bit more. I think it's probably safe to say that unless we come up with a really good migration story that we won't be enforcing this for existing agents any time soon, but I'll circle back with the maintainers and see what comes up.

In the meantime if you have any suggestions, we're all ears :)

@azdagron
Copy link
Member Author

So the maintainers huddled on this. I think this is the plan of action we'd like to follow:

  1. Warn on the undesired ID usage (this is already merged and will ship in 1.2.1. Since this was not in place for 1.2.0, we cannot change it through 1.3.x)
  2. Beginning with 1.4.0, disallow newly attested nodes which do not conform to the expected ID shape. Existing agents will still continue to operate successfully. As part of this change, we will also introduce a warning for existing IDs that are non-conformant.
  3. In 1.5.0 (or a later minor version), we will start denying agent authorization for agents with non-conformant IDs.

We will also provide a migration guide for operators who have found themselves impacted by this enforcement. This will cover topics like TOFU, which is the most security sensitive outcome of this enforcement (i.e. having to check both the old and new ID shape for TOFU during node attestation until all agents are migrated).

As far as timeline, we're looking at roughly 6 months until 1.4.0 and another 3 months after that for 1.5.0.

@zmt
Copy link
Contributor

zmt commented Oct 11, 2022

Notes from discussion today:

  • migration guide should be up as early as possible prior to enforcement release
  • we should add a feature flag to re-enable "log-only" mode in the version that enforces by default
    • ideally we can avoid mixed-mode attestation (skip step 2 above)
    • remove the "log-only" flag in the following minor release
    • if a consumer upgrades and encounters this unexpectedly with non-conforming agents, it will cause a major outage that the feature flag could help mitigate more quickly
  • expectation is that the transition on upgrade would happen quickly
    • explicit eviction of agents should not be necessary (should confirm)
    • agents would re-attest for the conforming SVID at next sync with server when rejected for non-conforming SVID

@zmt
Copy link
Contributor

zmt commented Sep 19, 2023

I need to work on this in conjunction with #3527. Even if I complete the PR in the 1.8.1 timeframe, we should not land it until 1.9.0.

@amartinezfayo amartinezfayo modified the milestones: 1.8.1, 1.8.2 Oct 10, 2023
@amartinezfayo amartinezfayo modified the milestones: 1.8.3, 1.8.4 Oct 11, 2023
@azdagron azdagron unassigned zmt Nov 2, 2023
@amartinezfayo amartinezfayo modified the milestones: 1.8.4, 1.8.5 Nov 8, 2023
@amartinezfayo amartinezfayo removed this from the 1.8.5 milestone Nov 16, 2023
@amartinezfayo amartinezfayo added the help wanted Issues with this label are ready to start work but are in need of someone to do it label Nov 16, 2023
@zmt
Copy link
Contributor

zmt commented Nov 21, 2023

It is clear to me now that these will never be high enough priority among my competing work priorities that I will ever accomplish them. Please re-label help wanted. I apologize to the community for holding these so long and not completing them.

@SilvaMatteus
Copy link
Contributor

Hello @azdagron,

I would be happy to work on this feature after I finish the retry bootstrap one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues with this label are ready to start work but are in need of someone to do it priority/backlog Issue is approved and in the backlog
Projects
None yet
Development

No branches or pull requests

7 participants