-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Federation Graduation #1619
Comments
There are two parts to the federation configuration:
I think number 1 is not likely to change (with any sort of frequency) over time. I think it is a great candidate to remain in the configuration file. Number 2 however, may require some debate. Federation relationships aren't static and may change over time. The configuration file may not be the right approach. Regarding "cluster" configuration and mismatch detection, what would be the source of truth? How would servers interact with that source of truth? The federation related configuration isn't the only thing that needs to match in HA. Other examples include the datastore configuration. I kind of feel that solving this problem is out of scope for graduating federation. Regarding an API driven configuration, I think that has some promise. It does require that the cluster be up and running before you can make modifications to the federation relationships, but I think that would be ok. An outside process could "reconcile" the federation relationships using the API offered by the cluster. Outside of how federation is configured, I think we have an additional problem that we might need solved before we can "graduate" federation. Namely, what happens when two or more servers race on updating the federated bundle. For example, consider the following:
The above race would cause the bundle to stay at revision 1 until the next poll period. This may be an acceptable race since bundle revisions should be published well in advance, but nevertheless I think its good to think through. |
While I agree federation relationships are likely to change more frequently than the bundle endpoint itself, the frequency of those changes are infrequent on an absolute scale - i.e. I'd expect to need to update federated bundles a few times a month, not a few times a day (though correct me if there are use cases that could warrant daily federated bundle updates). Given this, I'd lean towards config that can be checked in to source control, as IMHO it's simpler to reason about and doesn't require another process for reconciliation. Regarding source of truth and races on writing the bundle, would an explicit revision number (or timestamp) solve this problem? Servers would always respect a higher revision number, and would use something like a |
We (maintainers) talked about this quite a bit. There is consensus that for many environments, the federation relationships between trust domains would change infrequently and that the config file is good enough for likely a majority of deployments. For those deployments that manage frequently changing federation relationships, nothing stops them from not using the config file for managing those relationships and instead building a sidecar or admin workload that manages fetching and updating the federated bundles via the SPIRE server APIs. I think we're ok moving forward with a configuration file based approach based on the current experimental configuration. Regarding the race condition around updating bundles, there is certainly more we can do, in particular with a little more support from the datastore to detect and correct the race. However, the SPIFFE Federation specification (in progress) gives some specific guidance that a new key for a trust domain be published in advance, between 3-5x the bundle refresh hint. This means that even if the race condition is hit, and a bundle update is lost until the next poll period, that SPIRE should be able to correct itself well before any new keys are put in use. This doesn't mean we don't try and mitigate the race condition, but I think it does mean that we can carry forward with "graduating" the federation support in SPIRE and make this more robust down the road. |
Right now the experimental configuration looks something like this:
How do people feel about changing it to the following for configuring the server's bundle endpoint?
And for the configuration for consuming the bundle endpoint of another trust domain?
|
Looks good to me. I'll start working on refactoring the server configuration as specified, and document it as I go. |
Per #1656 (comment) the configuration would now look like: server {
federation {
bundle_endpoint {
address = "0.0.0.0"
port = 8443
acme {
domain_name = "example.org"
}
}
federate_with "spiffe://domain1.test" {
bundle_endpoint {
address = "1.2.3.4"
use_web_pki = true
}
}
}
} |
Is there any issue in using |
I think keeping it consistent with |
Cool, then the configuration ends up looking like: server {
federation {
bundle_endpoint {
address = "0.0.0.0"
port = 8443
acme {
domain_name = "example.org"
}
}
federates_with "spiffe://domain1.test" {
bundle_endpoint {
address = "1.2.3.4"
use_web_pki = true
}
}
}
} |
Enabling SPIFFE Federation in SPIRE today is done by the use of an experimental configuration flag. With the goal of promoting it to a stable feature, we need to discuss some issues with the current implementation that might be relevant for various members of the community.
Spire Server currently relies on a file based configuration that is only applied during startup. There are consequences associated with this approach:
Configuration mismatch possible fix
Introducing some form of configuration validation could deal with the mismatch issue. For that purpose, we will need a way of knowing the global cluster configuration so we can compare with what we are trying to configure. Doing this will address the problem of adding a new server, or changing an already existing one, into an improper configuration that doesn't align with the rest of the cluster.
In order to enforce changes to the cluster configuration, we will need a way of expressing that we want to apply configuration changes to the whole cluster, so we don't get rejected by the configuration validation. We will also need to implement the propagation of this new configuration to every server in the cluster. A flag that we can pass when starting the server could suffice to express we want a configuration change.
Restart possible fix
Alternatively, we could move the federation configuration altogether. Instead of handling it in the configuration file, we could define an API for this purpose. Doing this will have the benefit of not needing to restart the server, and removes the possibility of servers with different federation configurations. It also reduces the cluster management, when it comes to federation, to just one entry point. The main difference will be that at least one server should be running in order to enable federation, as opposed to configuring it in advance.
Request for Comments
Does this sound reasonable? Are there other approaches to consider? Should we only address one of the issues or both of them?
The text was updated successfully, but these errors were encountered: