Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose Updated Protos for the SPIRE Server API #1057

Closed
evan2645 opened this issue Aug 7, 2019 · 39 comments
Closed

Propose Updated Protos for the SPIRE Server API #1057

evan2645 opened this issue Aug 7, 2019 · 39 comments
Milestone

Comments

@evan2645
Copy link
Member

evan2645 commented Aug 7, 2019

The SPIRE registration API was originally written to allow manipulation of SPIRE registration entries. It has since become the de-facto API for all things administrative. This includes decidedly un-registration-y things such as generating join tokens, minting arbitrary SVIDs, agent management, and bundle management.

One problem is simple naming - the "registration api" should not include things that are not related to registration. Another problem is the eventual access control that we want to provide... we currently have the "admin" bit you can set to give carte blanche access, however the scope is too wide. Finally, the registration API has grown organically over time to add features such as ListBy* etc... these features were added in a compatible manner, but we now have a chance to take a new approach to supporting this functionality in a more idiomatic way.

This issue is done when we have loose consensus on the proto(s) for the next iteration of SPIRE management APIs.

@azdagron
Copy link
Member

It would be super convenient to have APIs that force updating federated bundles instead of waiting for the refresh hint.

@evan2645
Copy link
Member Author

I put some thought into this last week. I took the "if we started over again, what would we like to see" approach. While futzing around with some different options, I noted that some proto types are shared between the registration api and the node api... Since we also know we want to refactor the node api, the scope of my exploration ended up widening significantly.

What I ended up with was a suite of APIs which are somewhat "resource" oriented. For instance, one API for managing agents, another for managing bundles, and so on. Below I'll describe the structure of the APIs, related protos, types etc.

Note: I stopped short of defining the actual protos. This is simply an organizational proposal, and I'm looking for as much feedback as possible. Once we decide on the organization, we can begin modeling the protos themselves.

The Top Level

The current location stays as is (proto/spire). There are three main categorizations within this level: API, plugin, and type.

api/

This directory is where all the service definitions live, along with service-specific message types such as Request* and Response* messages.

plugin/

This directory is similar to the api/ directory, except that it's where service definitions for plugin interfaces live. It is further subdivided into plugin/server and plugin/agent, and the rest of the structure is largely similar to what's in place today.

type/

This is where all shared types live. Any message/type that is not specific to a service (i.e. any non-Request* non-Response* type) lives here. The contents of this directory may be either multiple proto files (for a single type package) or may be broken up into multiple packages/directories based on the resource(s) they represent. This directory structure is meant to subsume the current common/ directory.

API organization

Within the api/ directory, there are several subdirectories, each of which represents a distinct API (which is a gRPC/Protobuf service definition in this context). For the most part, each API deals with a certain aspect of SPIRE management - I've identified five of these.

api/agent

The agent API is used for managing agents. This includes tasks such as eviction, listing attested agents, introspecting agent (e.g. looking at their selectors etc). It is also where token generation RPCs would live, since that is decidedly an agent-oriented operation.

api/bundle

This API is used for bundle management, both local and federated. Tasks like setting/updating bundle contents, listing bundles, and perhaps some federation-related stuff (such as asking SPIRE server to re-poke a bundle endpoint) would all live here.

api/entry

All things related to managing registration entries. Entry CRUD, entry lookup by selector, etc.

api/issuer

This API is a little different than the rest in that it is not a resource management API. Rather, this API is the moral equivalent of the current Node API. It is used for SVID issuance (applying the appropriate authz lookups etc). It is also where the attestation RPC would live (as the result of this is SVID issuance).

api/svid

All RPCs related to SVID management live here. In the future, this might include things like listing issued SVIDs (which would be necessary for revocation), revoking issued SVIDs, and so on. For now, I think the only thing that would live here currently are the Mint*SVID RPCs.

Shared Types

There are many types which are shared across the proposed APIs, including bundles, entries, etc. Since the APIs are mostly resource-oriented, I'd propose organizing the shared types along these boundaries as well. For instance, type/bundle.proto or type/bundle/bundle.proto for all bundle-related types.

Request for Comments

For now, I'd like to focus on the organization of these APIs and their boundaries. What do you like about this organization, and what sucks about it? We would like to future-proof it as much as possible... Some things to think about when considering the above:

  • How might future features fit (or not fit) into this model? Try to think of features you've had in mind and where the RPCs for that feature might land
  • Does the naming make sense? What would consuming protos with these names be like after their libraries are codegen'd?
  • What other options are there beyond resource-oriented API boundaries? Do those make more sense than this?
  • Is anything missing? Can you think of an RPC that doesn't fit neatly into any of the APIs listed above?

There will obviously be a ton of work to realize something like this. I also see it as an opportunity to fix all the poor naming choices and inconsistencies that are littered across the current APIs. If/when we gain consensus on an approach that makes sense, including agreement on the scope of this change (i.e. registration + node api), I will open a new issue for this that will supersede all existing/related issues.

Thanks in advance for your thoughts :)

@mcpherrinm
Copy link
Contributor

Have we ever considered representing configuration with protos?

Eg, Envoy accepts static json config (or dynamic config via grpc) which is defined by a proto. It's nice since it's easy to validate configuration, document, and work with.

@ZymoticB
Copy link
Contributor

ZymoticB commented Nov 1, 2019

+1 to removing common from the directory structure.

Some leading questions to poke at the API break down:
If I wanted to add an API that showed on the server side what registration entries an agent should have (to help an operator understand the effects of node registrations) would that live in the agent API or the entry API?

Should node entries be a part of the entry API, or should "agent" registrations be separated from "workload" registrations (this distinction doesn't explicitly exist in the datastore API, but there was been proposals to add it)?

Do you want to separate the APIs into categories? For example, the issuer API is (IMO) internal, the agent API is fairly operator focused / introspection focused (token generation and agent eviction are exceptions). The entry API and the svid API are the best examples IMO of "many 3rd party integrations will be built and it needs to be very stable". The issuer API probably doesn't need to maintain that guarantee since its internal. So long as there is a compatibiltiy story between the server and agent anything should be fair game. The entry and svid APIs probably deserve more detailed review as they are a bit more set in stone once released.

I agree there should be subdirectories within types.

One thing worth thinking about is putting together a proposal for the protos locally to see if there is anything that comes out to challenge this proposal. I agree its too early to publish but it would suck to come to an agreement and then find issues only once it gets translated to proto.

@dennisgove
Copy link
Contributor

Is it worth considering an administrative API?

For example, api/agent includes an eviction action which is obviously needed, but if agents themselves are going to connect to api/agent then might it be possible for a misbehaving agent to make it difficult for the agent to be evicted? Would it be beneficial to have an API accessible to a limited set of actors to perform system stability actions?

@APTy
Copy link
Contributor

APTy commented Nov 8, 2019

I definitely think that this would benefit from a lightweight RBAC system. Being able to say "okay this admin can evict agents" or "this admin can federate bundles" or "this admin can register workloads" will be really great to have.

...of course I personally would like even more fine-grained ACLs such as "only the admin that created a workload registration can modify it", but that may be asking too much at this stage. Even having coarse-grained ACLs on the api/* subdirectories would be a major win.

@azdagron
Copy link
Member

azdagron commented Dec 16, 2019

There has been a feature request for an API to manage federated domains which I think makes sense (be able to kick off federation w/o restarting a server). The difficulty there will be deciding which SPIRE server is in charge of updating the federated bundle. We haven't really solved that for HA anyway so it isn't a new problem.

@azdagron azdagron changed the title Propose Updated Protos for the "Registration API" Propose Updated Protos for the SPIRE Server API Jan 13, 2020
@evan2645
Copy link
Member Author

For example, api/agent includes an eviction action which is obviously needed, but if agents themselves are going to connect to api/agent then might it be possible for a misbehaving agent to make it difficult for the agent to be evicted? Would it be beneficial to have an API accessible to a limited set of actors to perform system stability actions?

My original intention for api/agent was to provide purely agent management functions, so I would expect an operator to be interacting with it rather than an agent... The api/issuer endpoint would instead be consumed by agents which are requesting SVIDs etc... but, I think this question applies to api/bundle, which would include administrative functions but also lookup functions which agents would need access to. I wonder if we should provide additional (perhaps redundant?) bundle functionality in api/issuer to avoid this

@azdagron
Copy link
Member

azdagron commented Jan 29, 2020

I've done a little design work on what the Issuer API might look like. It is based on the SPIFFE Identity Classification proposal (#1338). I've got a commit (4b79ec1) in the api-refactor-design branch if you want to check it out in its entirety.

** Updates **

  • Renamed identity_uuid to entry_id
  • Added parameter message for minting a download X509CA
  • Added an X509CA message instead of having multiple fields in the NewDownstreamX509CAResponse

Here is the basic service description:

service Issuer {
    rpc AttestForX509SVID(stream AttestForX509SVIDRequest) returns (stream AttestForX509SVIDResponse);
    rpc RenewX509SVID(stream RenewX509SVIDRequest) returns (stream RenewX509SVIDResponse);
    rpc NewX509SVID(NewX509SVIDRequest) returns (NewX509SVIDResponse);
    rpc NewJWTSVID(NewJWTSVIDRequest) returns (NewJWTSVIDResponse);
    rpc NewDownstreamX509CA(NewDownstreamX509CARequest) returns (NewDownstreamX509CAResponse);
}

AttestForX509SVID

rpc AttestForX509SVID(stream AttestForX509SVIDRequest) returns (stream AttestForX509SVIDResponse);

AttestForX509SVID attests the caller using node attestation and results in one or more X509-SVIDs being returned to the caller. The RPC uses a bidirectional stream to accomodate challenge/response-based node attestation methods.

Authorization

None

Request

The request initially contains the attestation parameters: namely the attestation data from and the parameters the issuer needs to mint the X509-SVID (i.e. public key).

Subsequent requests sent over the stream will convey responses to challenges issued by the issuer (if any).

message AttestForX509SVIDRequest {
    message Params {
        spire.type.AttestationData data = 1;
        X509SVIDParams params = 2;
    }

    oneof step {
        // Attestation parameters. These are only sent in the initial request.
        Params params = 1;

        // The response to a challenge issued by the attestor. Only sent in
        // response to a challenge received by the issuer.
        bytes challenge_response = 2;
    }
}

Response

The response will either have a challenge that needs to be answered, or an attesation result. The result is EITHER:

  1. A single canonical identity

  2. One or more attributed identities

message AttestForX509SVIDResponse {
    message Result {
        spire.type.svid.X509SVID canonical_identity = 1;
        repeated spire.type.svid.X509SVID attributed_identities = 2;
    }

    oneof step {
        // Attestation results. If set, attestation has completed.
        Result result = 1;

        // A challenge issued by the attestor. If set, the caller is expected
        // to send another request on the stream with the challenge response.
        bytes challenge = 2;
    }
}

RenewX509SVID

rpc RenewX509SVID(stream RenewX509SVIDRequest) returns (stream RenewX509SVIDResponse);

RenewX509SVID is used to renew a canonical identity previously obtained through AttestForX509SVID. It cannot be used to renew any other type of identity.

Canonical identities are by definition unique. SPIRE closely tracks SVIDs issued for canonical identites to ensure that only a single SVID is active for a given canonical identity. RenewX509SVID is the process by which a new SVID is made active (making the existing one stale). A bidirectional stream facilitates two-phase commit of this action for increased robustness, i.e. the caller needs to acknowledge that it has received the new SVID and done whatever is required to switch over to using the SVID (e.g. persist to disk). Only after receiving this acknowledgement does the server switch consider the new SVID active.

Authorization

Callers must present an active canonical identity.

Request

The first request on the stream contains the parameters the issuer needs to mint the X509-SVID (i.e. public key).

The second request contains the acknowledgement that the renewed SVID has been received and that the caller will use it for subsequent RPCs.

message RenewX509SVIDRequest {
    message Ack {};

    oneof step {
         // Parameters for the X509-SVID
        X509SVIDParams params = 1;

        // An acknowledgement by the receiving end that it has received the
        // renewed canonical identity SVID and will be using it in future
        // requests.
        Ack ack = 2;
    }
}

Response

The response contains the renewed SVID

message RenewX509SVIDResponse {
    // The renewed X509-SVID
    spire.type.svid.X509SVID svid = 1;
}

NewX509SVID

rpc NewX509SVID(NewX509SVIDRequest) returns (NewX509SVIDResponse);

NewX509SVID is used by delegatees (those to whom identities have been delegated) to obtain an X509-SVID for the delegated identity.

Authorization

Callers must present an SVID for an identity that is directly or indirectly delegated the identity being requested. For canonical identities, the SVID must be the active SVID.

Request

The request contains the UUID for the identity to issue an X509-SVID for. This UUID is used to look up the registration entry for the identity, which contains the SPIFFE ID and additional properties for the SVID (i.e. TTL, DNS names, etc)

The request also contains parameters the issuer needs to mint the X509-SVID (e.g. public key)

message NewX509SVIDRequest {
    // The ID of the registration entry describing the identity/SVID
    string entry_id = 1;

    // Parameters for the X509-SVID
    X509SVIDParams params = 2;
}

Response

The response contains the newly signed X509-SVID for the delegated identity.

message NewX509SVIDResponse {
    // The newly issued X509-SVID
    spire.type.svid.X509SVID svid = 1;
}

NewJWTSVID

rpc NewJWTSVID(NewJWTSVIDRequest) returns (NewJWTSVIDResponse);

NewJWTSVID is used by delegatees (those to whom identities have been delegated) to obtain a JWT-SVID for the delegated identity.

Authorization

Callers must present an SVID for an identity that is directly or indirectly delegated the identity being requested. For canonical identities, the SVID must be the active SVID.

Request

The request contains the UUID for the identity to issue an JWT-SVID for. This UUID is used to look up the registration entry for the identity, which contains the SPIFFE ID and additional properties for the SVID (i.e. TTL, etc)

The request also contains parameters the issuer needs to mint the JWT-SVID (e.g. audience)

message NewJWTSVIDRequest {
    // The ID of the registration entry describing the identity/SVID
    string entry_id = 1;

    // Parameters for the JWT-SVID
    JWTSVIDParams params = 2;
}

Response

The response contains the newly signed JWT-SVID for the delegated identity.

message NewJWTSVIDResponse {
    // The newly issued JWT-SVID
    spire.type.svid.JWTSVID svid = 1;
}

NewDowstreamX509CA

rpc NewDownstreamX509CA(NewDownstreamX509CARequest) returns (NewDownstreamX509CAResponse);

NewDownstreamX509CA mints a new X509 CA intermediate appropriate for use by a downstream X509 CA for minting X509-SVIDs.

Authorization

Callers must present an SVID for a downstream identity (i.e. one that has been marked downstream at time of registration).

Request

The request contains the public key for the downstream X509 CA. The issuer will determine other properties for the CA certificate.

message X509CAParams {
    // The public key of the downstream X509 CA to use in the new CA certificate.
    bytes public_key = 1;
}

message NewDownstreamX509CARequest {
    X509CAParams params = 1;
}

Response

The response contains the CA certificate chain and the root CAs. The certificate chain will include the CA certificate and any intermediates necessary to chain up to a certificate present in the roots.

message X509CA {
    // CA certificate and any intermediates part of the chain back to the root
    // (DER encoded). The CA certificate is the first. 
    repeated bytes ca_cert_chain = 1;

    // Root CA certificates (DER encoded).
    repeated bytes root_cas = 2;
}

message NewDownstreamX509CAResponse {
    X509CA x509_ca = 1;
}

@evan2645
Copy link
Member Author

The response will either have a challenge that needs to be answered, or an attestation result. The result is EITHER:
A single canonical identity
One or more attributed identities

I think the implication here is that attributed identities will share the same key. I don't know that it's actually a problem, but I do think it's different than what we do today.

If we had an agent that used a node attestor plugin that produced only attributed identities, would the agent need to maintain one connection per attributed identity in order to be authorized to fetch its full set of delegated entries?

The result message feels a little smelly too in that consumers may not know what to expect, need to check both fields... It makes me wonder if canonical identity should be mandatory even if the underlying attestation type doesn't support it well... just thinking out loud

NewX509SVID is used by delegatees (those to whom identities have been delegated) to obtain an X509-SVID for the delegated identity.

Since it will be common for consumers to need more than one identity at a time (agent boot comes to mind, as does bulk registration), and the authorization lookup is expensive, I think it would make sense to allow agents to request more than one X509-SVID at a time?

message NewX509SVIDRequest {
// The unique ID of the identity being requested
string identity_uuid = 1;

small nit: this is really the entry id and not the identity id, since a single identity can be represented by multiple entries. Is the intention to move away from entry-related language in this API?

The request contains the public key for the downstream X509 CA. The issuer will determine other properties for the CA certificate.

Seems prudent to follow the X509SVIDParams pattern by either re-using the params message or introducing a new (similar) one?

NewDownstreamX509CA mints a new X509 CA intermediate appropriate for use by a downstream X509 CA for minting X509-SVIDs.

This makes me wonder about #1190. Should we have a NewDownstreamJWKS RPC here? Or does that really belong in a Bundle API? Or, should we have an entirely separate API for "downstream" related needs?

Overall, I think this looks great and makes a lot of sense. The canonical/attributed bits that poke through on the Attest call feels like it could use some attention... I do like that this proposal doesn't include too much of the "new language" however the fact that the only bit where it does poke through is the part that feels a little if-y gives me pause.

@azdagron
Copy link
Member

azdagron commented Jan 30, 2020

If we had an agent that used a node attestor plugin that produced only attributed identities, would the agent need to maintain one connection per attributed identity in order to be authorized to fetch its full set of delegated entries?

Yeah, that is a little rough. Your musings about about returning a canonical identity even for attestors that don't support it might solve that. Attestors that don't support it could just generate a canonical identity with a UUID just to provide a single identity to use to authenticate with the server.

would make sense to allow agents to request more than one X509-SVID at a time?

Yes. Good idea.

small nit: this is really the entry id and not the identity id, since a single identity can be represented by multiple entries

Whoops! This was left over from some other exploratory work around registering "identities" instead of the overly generic term "entry". I'll edit above.

Seems prudent to follow the X509SVIDParams pattern

SGTM

Regarding Downstream stuff. I think I'd favor a separate API. That kind of an API does feel more user-centric than service-centric...

I wonder how the following would feel:

  1. Change "Attest" to return only a canonical identity. For attestors that don't support canonical identities, generate one based on a UUID.
  2. Split out NewX509SVID into NewDelegatedX509SVID and NewAttributedX509SVID. Both of which would use the canonical identity for authorization. The former would be useful for agent-like entities that are really only concerned about delegation. The latter would be useful for agentless entities. The drawback is that for an agentless entity to obtain a meaningful SVID, it would need to make an additional RPC over the current proposal.

@azdagron
Copy link
Member

azdagron commented Jan 30, 2020

Another advantage of splitting out NewX509SVID, is that it feels more natural to add a NewAttributedJWTSVID instead of somehow extending or adding an "Attest" RPC to obtain JWT-SVIDs for attributed entities. Although we'd have to do something similar if we wanted to obtain JWT-SVIDs for canonical identities....

@azdagron
Copy link
Member

I guess # 1 could be done independently of # 2 attributed identities are backed by an entry and therefore the shape of each RPC would be the same. Having them separate could however lead to opportunities to optimize the "authorization" crawl that ensures the caller is authorized for the entry.

@azdagron
Copy link
Member

After some discussion with @evan2645, I've removed the language from the Issuer API regarding canonical/attributed/delegated, etc (7c2c420). This allows us to move forward without locking into the new nomenclature. At the same time this new proposal should allow us to adopt the new nomenclature later through the addition of new RPCs without muddying the RPCs we want to use now for agents.

@azdagron
Copy link
Member

Ok, the api-refactor-design branch on SPIRE has been updated with the latest design. The proto/spire/next directory contains the new API descriptions:

https://github.com/spiffe/spire/tree/api-refactor-design/proto/spire/next

The new APIs are:
Agent - agent management, including attestation.
Bundle - basic bundle management CRUD
Entry - registration entries CRUD, including authorized entry fetching (e.g. by agents)
SVID - everything related to SVID minting. This includes direct minting of one-off SVIDs by administrators and minting via registration entry (e.g. by agents).

@mcpherrinm
Copy link
Contributor

I've gone through the most recent set of protos (and not prior iterations). some of these are notes about the APIs, but at least in two cases there are some api nits based on underlying concerns around SPIRE.

  • Creating join tokens seems nicer; though I still don't like the join token ending up in the agent svid and requiring an extra registration entry to the "real" agent ID.

  • GetAgent takes a spire.types.AgentMask but ListAgents does not. I think I'd be more likely to want a subset of data when listing. We should make sure to support masks on all List operations.

  • MintX509SVID would be nice if it supported issuing off the upstream CA directly (bool option?) -- that would be useful if the upstream CA is expensive to call but has longer lifetimes, or for applications that don't handle intermediates properly.

  • Should we just have one MintSVID RPC with a OneOf for the type argument? Seems like there's arguments either way. sharing common args like ttl, but maybe they won't be universally shared for new svid types.

  • Do we need both BatchFoo and Foo (for various services, like BatchCreateEntry). Can we just have the batch API?

  • Federation still seems difficult to handle in this API. Each entry has to maintain a separate set of federation entires, which is potentially difficult to use: If we want to add a new federated trust domain to a set of entries, we need to list entries and then loop and update, which seems like it could be racy, and requires a lot of churn. Maybe a "federation" should be a first-class type of some sort

  • There's a lot of stringly-typing going on in the current APIs. Should we have types for at least the most commonly repeated types like SPIFFE ID?

@mcpherrinm
Copy link
Contributor

mcpherrinm commented Apr 28, 2020

rpc DeleteAgent(DeleteAgentRequest) returns (google.protobuf.Empty);

If I call

	resp, err := client.CreateJoinToken(ctx, &agent.CreateJoinTokenRequest{
		AgentId: someAgent),
	})

will this delete it?

	_, err = agentClient.DeleteAgent(ctx, &agent.DeleteAgentRequest{
		Id: someAgent,
	})

Only mildly concerned because there's an extra entry created by the CreateJoinToken RPC. Would be nice if DeleteAgent returned a bit more info (how many entries were deleted, maybe?).

Edit: The comment on DeleteAgentRequest says it's a SPIFFE ID, but should that be an entry ID instead? This is what I mean by the API being too stringly-typed; easy to misuse this way.

@mcpherrinm
Copy link
Contributor

It would be nice if you could Delete with a filter, similar to how you List.
Otherwise you have to list and then delete in a loop -- not very efficient, and not very ergonomic either.

@azdagron
Copy link
Member

Creating join tokens seems nicer; though I still don't like the join token ending up in the agent svid and requiring an extra registration entry to the "real" agent ID.

That's a nice idea. Unless we somehow encoded the ID into the token itself (which may have some interesting metadata leak implications), it would require datastore schema and plugin changes, which means we wouldn't likely be able to implement that until 0.12.0. With the API as structured, that is something we could eventually introduce (and maybe provide a new bool field to opt into the new behavior so we stay compatible with older code).

GetAgent takes a spire.types.AgentMask but ListAgents does not. I think I'd be more likely to want a subset of data when listing. We should make sure to support masks on all List operations.

Good catch. I'll fix this.

MintX509SVID would be nice if it supported issuing off the upstream CA directly (bool option?) -- that would be useful if the upstream CA is expensive to call but has longer lifetimes, or for applications that don't handle intermediates properly.

I think this is a potentially useful use case, but introducing it also means adding a new RPC to UpstreamAuthority to support it as the UpstreamAuthority interface is opinionated about the type of X509-SVID it signs (CA intermediates only). Adding a new RPC to serve this purpose could be done a backwards compatible manner since the RPC would be optional. This feature in general could also something we could introduce to the MintX509SVID RPC at a later point without impacting compatibility.

Should we just have one MintSVID RPC with a OneOf for the type argument? Seems like there's arguments either way. sharing common args like ttl, but maybe they won't be universally shared for new svid types.

I think I'd prefer them separate. We've got into trouble in the past with RPCs that had many "modes", even just in terms of code complexity. I have lots of other little nitpicky reasons why I'd personally prefer to see them seperate. Like you said, arguments either way :) In general I think we're better off with multiple RPCs over an overloaded RPC.

Do we need both BatchFoo and Foo (for various services, like BatchCreateEntry). Can we just have the batch API?

This is a valid point. I don't have a strong opinion here. For the Entry API, since BatchCreateEntry naturally covers both the CreateEntry and CreateEntryIfNotExists, it could lead to some simplification for sure. If nobody else disagrees, I'd be for it.

Federation still seems difficult to handle in this API.

Yes.....I can see the difficultly managing the per-entry opt-in for federation. There are other federation related things that we've considered as well, including configuring federation relationships via the API instead of the config file. We ultimately punted because that brought up hard questions like 1) which server(s) are polling the federation bundle endpoint of the foreign trust domain, 2) if more than one server , how do they prevent racing on updates, 3) if one server, how do we choose. These questions are currently sidestepped, but not explicitly handled, by the current experimental configuration section that configures federation.

There's a lot of stringly-typing going on in the current APIs. Should we have types for at least the most commonly repeated types like SPIFFE ID?

That might be ok. It doesn't reduce the validation that has to happen, but maybe simplifies it a bit. A SPIFFEID type could line up quite nicely with the strong types we've added for SPIFFE IDs in the go-spiffe v2 library. Unless others disagree, I'd be for it.

@azdagron
Copy link
Member

Regarding DeleteAgentRequest, it is synonymous with the current EvictAgent RPC. It takes in the agent SPIFFE ID determined by attestation (and as reflected in the Agent type). It would not remove the novelty entry created by CreateJoinToken.

Maybe this points out an omission in the API. If we had a DeleteJoinToken request, would that clear things up? Maybe DeleteJoinToken could identify the novelty entry, if any, and delete it as well.

@amartinezfayo
Copy link
Member

Here are some thoughts after looking at this proposal:

  • I agree that for those RPCs that we have BatchX and X, it would be good to simplify things and have the RPCs that perform the batch operations only.
  • It would be good to have masks as part of all List* request messages.
  • There are some bundle operations that are only for federated bundles, but that's not quite clear from the name of the RPCs. I'm referring to CreateBundle, UpdateBundle, SetBundle, DeleteBundle and ListBundles. Maybe CreateFederatedBundle, UpdateFederatedBundle, SetFederatedBundle, DeleteFederatedBundle and ListFederatedBundles are better names?
  • Similarly, in the AppendBundle RPC, we may be clearer in the name that is only meant to be used with the bundle for the trust domain of the SPIRE server.
  • It's nice to have a Status message type. I'm wondering if it shouldn't be used more widely across the response messages. Also, I've been thinking if we could take the opportunity to provide the structure to make easier the debugging of errors and improve logging. Specifically, I'm thinking that we could include an additional field in the Status type that could be used to include more details. That could be something like repeated google.protobuf.Any details. Something along the lines of this: https://github.com/grpc/grpc/blob/master/src/proto/grpc/status/status.proto.

@azdagron
Copy link
Member

azdagron commented May 1, 2020

There are some bundle operations that are only for federated bundles, but that's not quite clear from the name of the RPCs

Hmm, that's a fair point and your suggested names mirror what is currently in the registration API. I'm somewhat torn. On one hand, like you said, I think it is a little confusing that most of those operations don't make sense when talking about the bundle for the trust domain of the SPIRE server. On the other hand, does it ever make sense to just want to list all of the bundles the server knows about, independent of the trust domain of the server? I'm curious what others thoughts are here...

It's nice to have a Status message type. I'm wondering if it shouldn't be used more widely across the response messages.

Can you give an example of an API where you'd like to see the status? In particular, using the status message to convey the results of an operation only works if the RPC returns success. This comes down to a discussion around whether or not the API should rely on transport mechanisms to convey application level status.

Also, I've been thinking if we could take the opportunity to provide the structure to make easier the debugging of errors and improve logging

That might be useful. The detail message type that any given RPC returns would probably need to become part of the API. Considering that I'd probably advocate for punting on adding the details field until a clear use case arose.

@azdagron
Copy link
Member

azdagron commented May 4, 2020

Regarding DeleteJoinToken, how do we see that being used? Without the ability to List, an admin is going to have to already know what the token value is (i.e. keep track of the join token value returned from the CreateJoinToken call). If we add ListJoinTokens, I think we're nearing the point where join tokens possibly belong in their own API.

@azdagron
Copy link
Member

azdagron commented May 4, 2020

Considering how expensive the entry authorization check is, and is likely to remain for some time, I think it's also probably worth having "batch" versions for the SVID API's NewX509SVID RPC. NewJWTSVID and NewDownstreamCA probably don't need a batch version since the former is on-demand based on workload API requests, so it would be impractical to batch and the latter is called with little frequency and only for one CA at a time.

@amartinezfayo
Copy link
Member

Can you give an example of an API where you'd like to see the status? In particular, using the status message to convey the results of an operation only works if the RPC returns success. This comes down to a discussion around whether or not the API should rely on transport mechanisms to convey application level status.

I see that the Status is included in all Batch*Response messages. I was speculating if there could be some other response messages that could benefit of including additional details in a successful response. After going over all the RPCs, it seems that all the responses that have a list of results (where each one has its own status) already have the Status included. My comment tended to know if it was thoroughly considered to be included in all relevant places, which looks like it was.

@azdagron
Copy link
Member

azdagron commented May 5, 2020

Here's the changes I plan on taking into the api-refactor-design branch:

  1. Where we have Batch* RPCs get rid of the non-batch RPCs. We can add the non-batch RPCs back in at a later point before releasing the APIs if we change our mind.
  2. Replace SPIFFE ID string fields with a strong type spire.types.SPIFFEID.
  3. Add DeleteJoinToken RPC to Agent API (see conversation immediately following this post)
  4. Rename bundle RPCs to make it clear which operations are intended for the primary bundle v.s. the federated bundles. Primary bundle has GetBundle and AppendBundle, federated Bundles have ListFederatedBundles, GetFederatedBundle, BatchCreateFederatedBundle, BatchUpdateFederatedBundle, BatchSetFederatedBundle, BatchDeleteFederatedBundle.

Here are things I don't think we should put in at this time:

  1. Federation as a first class type
  2. Changing the way join tokens work in regards to the agent SPIFFE ID and novelty entry creation.

Here are some things I'd still love opinions on:

  1. Should NewX509SVID be BatchNewX509SVID or NewX509SVIDs to amortize authorization cost when an agent wants to sign more than one SVID at a time (probably a pretty frequent request).
  2. Should we add a ListJoinTokens RPC to facilitate deletion? If so, should we move join token management out of the Agent API into its own? We had it in the Agent API because that's where the AttestAgent RPC is and it felt nice to be close, but if we grow the the JoinToken CRUD, it might be muddying that API.

@evan2645
Copy link
Member Author

evan2645 commented May 5, 2020

Should we add a ListJoinTokens RPC to facilitate deletion?

What is the use case for deleting a token? Token TTL is mandatory IIRC, so they could/should be self-cleaning. Maybe you'd want to somehow revoke a token once it has been created, but it's hard for me to see how that would be useful in the sorts of workflows I'm familiar with (e.g. passing into/pulled from a provisioning system, or generated by a human). I would imagine that in most cases tokens are used shortly after they're created - and if they're not, they expire.

Join token is weird in more than one way. It is a secret of sorts... and we store it in the db. It's the only "secret" that we really manage like that. Letting them leak back out feels a little weird to me - web services in which API keys can be generated but not retrieved afterwards come to mind.

@evan2645
Copy link
Member Author

evan2645 commented May 5, 2020

Should NewX509SVID be BatchNewX509SVID or NewX509SVIDs to amortize authorization cost when an agent wants to sign more than one SVID at a time (probably a pretty frequent request).

This sounds like a great idea. As for naming, I suppose I prefer the pluralized version as it feels more natural, but not at the cost of breaking consistency with other batch RPCs

@azdagron
Copy link
Member

azdagron commented May 5, 2020

@evan2645, regarding join tokens, that's a good point. We'd actually discussed this earlier IRL when we added CreateJoinToken. I think relying on pruning to remove the tokens is probably a good idea. I do think pruning could get a little smarter about also removing related novelty entries though. Right now, those dangle.

@azdagron
Copy link
Member

azdagron commented May 5, 2020

Re: BatchNewX509SVID v.s. NewX509SVIDs, I prefer that latter as well. I personally don't think it's necessarily departing from an established pattern since this API isn't really CRUDy. It already doesn't feel like the other APIs.

@amartinezfayo
Copy link
Member

Do we really need to keep the Batch prefix in those RPCs? E.g. I personally think that CreateFederatedBundles looks better than BatchCreateFederatedBundle. The plural name already clarifies that n entities will be created/processed and it doesn't look like prefixing with Batch adds a lot of value.

@azdagron
Copy link
Member

azdagron commented May 5, 2020

Do we really need to keep the Batch prefix in those RPCs?

The "Batch" prefix came from RPC naming conventions in the Google API Style Guideline, which provided some good guidance and prior art for some of the v2 design.

From what I can tell in their guidance for naming in a resource-oriented API, the resource is the Noun and MOST of the RPCs use a verb+noun naming style. So Bundle would be the Noun and the verbs would be Create, BatchCreate, etc. But their naming styles already have departure here for some things (e.g. ListBundles).

We've departed from them in some other minor conventions. Also, they have a different problem than us.... some of their decisions come from using a protobuf as a single source of truth for both a gRPC and REST-ful API.

I could go either way in regards to the Batch prefix. I'd love broad consensus before dropping it though.

@azdagron
Copy link
Member

azdagron commented May 5, 2020

In regards to using Google API Style Guidelines, it was a helpful place to look for established patterns but in no way did I intend for it to be law in terms of what we do. When we feel justified to go into a direction, I'm all about that as long as we are internally consistent.

@azdagron
Copy link
Member

azdagron commented May 5, 2020

If we do end up calling it BatchNewX509SVID, I'd expect it to follow the same request/response message patterns, i.e., take a repeated list of parameters for X509-SVIDs to create, and a repeated list of results, where each result had a status and (possibly) an X509-SVID. In other words, the other "batch" functions allow for, and communicate partial success. I'd want to stay consistent with this API if it was named similarly.

@azdagron
Copy link
Member

azdagron commented May 5, 2020

Branch is updated with BatchNewX509SVID, if you want to see how it looks:

https://github.com/spiffe/spire/blob/api-refactor-design/proto/spire-next/api/server/svid/v1/svid.proto#L26

@azdagron
Copy link
Member

azdagron commented May 5, 2020

One consequence of using a new spire.types.SPIFFEID message everywhere we expect a SPIFFE ID is that the spire.types.Entry message is no longer field compatible with the spire.common.RegistrationEntry message, which means less elegance when converting between the two message types (this matters because spire.common.RegistrationEntry is used in places outside of the API, e.g datastore, and has to be maintained for some time).

@mcpherrinm
Copy link
Contributor

I think I'm more worried about the external APIs, so having to do an internal conversion for the datastore plugin seems OK. We could provide a function to convert between the types, and validate that using a test that does reflection on fields.

@azdagron
Copy link
Member

azdagron commented May 6, 2020

The branch has been updated with all of the suggestions. Thanks for the feedback everyone. I think this is looking pretty good. I'll be cutting issues for the rest of the API soon so we can start on the implementation.

Since it will be some time before we "release" the new API, we still have time to make some changes. Hopefully any changes in the future are superficial.

@anvega
Copy link
Member

anvega commented Sep 1, 2020

Closing this out as consensus on the updated protos was gained and the new APIs have been implemented as of version 0.11.0

@anvega anvega closed this as completed Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants