Rethink how we disable challenge types #5913

aarongable · 2022-01-27T17:33:54Z

Currently we list all available validation methods in the RA's config file. When we want to disable a challenge type, we remove it from the config file and do a config deploy. So far, so good.

Disabling a challenge type today means two things happen:

When a client tries to validate a pre-existing authorization (that does have the disabled challenge type, because it was created before the challenge type was disabled), we return an error:

boulder/ra/ra.go

Lines 1563 to 1566 in fffd05e

    
           // This challenge type may have been disabled since the challenge was created. 
        
           if !ra.PA.ChallengeTypeEnabled(ch.Type) { 
        
           	return nil, berrors.MalformedError("challenge type %q no longer allowed", ch.Type) 
        
           }

When a client creates a new order, and we create new authorizations for that order, those authorizations do not have the disabled challenge type associated with them:

boulder/policy/pa.go

Lines 563 to 574 in fffd05e

    
           // Otherwise we collect up challenges based on what is enabled. 
        
           if pa.ChallengeTypeEnabled(core.ChallengeTypeHTTP01) { 
        
           	challenges = append(challenges, core.HTTPChallenge01(token)) 
        
           } 
        
           if pa.ChallengeTypeEnabled(core.ChallengeTypeTLSALPN01) { 
        
           	challenges = append(challenges, core.TLSALPNChallenge01(token)) 
        
           } 
        
           if pa.ChallengeTypeEnabled(core.ChallengeTypeDNS01) { 
        
           	challenges = append(challenges, core.DNSChallenge01(token)) 
        
           }

The first of these makes sense. When we disable a challenge type, it's probably because we think something is wrong with it, and so we don't want anyone to continue using the potentially-buggy validation path.

The latter of these makes less sense. While it is nice to not include the disabled challenge type in the first place, so that clients don't mistakenly believe that it is an option and get an error when they try to fulfill it, it also means that clients can't ever use that challenge type to validate that authorization, even after we re-enable the challenge type.

And the two effects above miss one critical case: if we believe that a validations performed using a specific challenge type may have been buggy, then we should not allow certificates to be issued based on those validations. But today, the following sequence of events is possible:

Client creates a new order and new authorizations for that order
Client fulfills those authorizations using a given challenge type
We disable that challenge type
Client finalizes their order and we issue the certificate

We should rethink the challenge enable/disable mechanism, in order to not rely on disabled authorizations while the challenge type is disabled, and in order to allow clients to smoothly begin using the challenge type again when it is re-enabled.

jvanasco · 2022-01-28T20:53:34Z

A few thoughts on the above.

It reads like there may be some benefit in extend the current toggle of "enabled/disabled" to a triplet of "enabled/disabled/temporary-disabled", in which a "temporary-disabled" challenge is expected to be re-enabled in the near future. Within that framework, for those "temporary-disabled" challenges, they could still be added to the authorizations, but they would not be rendered as an option for the order/auth/challenge, and their endpoints would cause failures.
A tangential issue is "what happens to pending authorizations when a new challenge type is introduced?"
"But today, the following sequence of events is possible". Wasn't that solved by invalidating all the authorizations, or did I dream that happened? If that didn't happen, wouldn't it necessitate another round of revocations?
I've handled similar situations to the "buggy validations" concept described by tracking a version-id and enforcing a minimum version. For example, assuming the TLS-ALPN-01 validator was at v1, the hotfix would bump the required minimum validation to v2, and the full-fix/re-enable would bump the version in the validator to v2. Just an idea here, but "finalize" could ensure compliance before issuance, and if any non-compliant authorizations were detected, those authorization could be immediately invalidated - causing the order to fail. Most clients should gracefully recover by requesting a new order, which would then only create the necessary new authorizations.

aarongable · 2022-01-29T01:13:44Z

"But today, the following sequence of events is possible". Wasn't that solved by invalidating all the authorizations, or did I dream that happened? If that didn't happen, wouldn't it necessitate another round of revocations?

Correct, it was solved by revoking all affected authorizations. But that required manual intervention and additional steps during an incident. We shouldn't need to do that in the future.

aarongable · 2024-01-08T22:55:08Z

Since I've been thinking about this for other reasons (see bug and PR linked above), here's my proposal for how to approach this.

Today, we have one challenge-filtering mechanism: pa.ChallengesFor produces a list of challenges filtered both by which challenges are enabled in the config, and which challenge types are appropriate for the identifier. Conflating these two kinds of filtering leads to the issues described above.

Instead, we should have two wholly separate challenge-filtering mechanisms:

The first, producing a list of challenges based on the identifier, should happen only when an Authorization is being created, before being written to the database for the first time. This means that all authzs will have the full suite of challenges theoretically available to them in the database at all times, making them resilient to challenge types being temporarily disabled.
The second filter, narrowing down a set of challenges to just those which are enabled, should happen only when reading Authorization objects from the database. This will ensure that disabling a challenge type prevents all issuance which was based on validations conducted via that challenge, by removing it from authorizations as they're checked. This will require just one special case and a new error message when an authorization stored in the database as "valid" can be returned to the user as "invalid" because its only validated challenge has been disabled.

jvanasco · 2024-01-08T23:49:23Z

This seems like an elegant solution.

I think the second filter proposal works perfect for a client interacting with Order objects. The small concern I have is for interacting with Authorization objects during the flow. The concern I have on this is over compatibility with existing clients.

Would the challenge not be in the payload due to the filter, or would it be in the payload but listed as invalid, or would it be in the payload with no status? I believe only non-pending authz show a status, the others just show a type/url/token in the spec.

Imagine there is a bug with HTTP-01 and it's disabled after an order is made but before challenges are completed. If a subscriber iterates the authz, not seeing the http-01 challenge may cause an error or lockup in their client as it does not know how to act. Seeing a disabled http-01 challenge may let them try to complete it, which should then trigger the error code about the challenge being disabled.

aarongable · 2024-01-09T00:21:24Z

The challenge would be omitted from the authz payload, just like the HTTP-01 and TLS-ALPN-01 challenges are omitted from wildcard authz payloads. I don't think this is likely to break clients in unexpected ways, since it's what happens when we disable a challenge type today, and unfortunately we've had to do that a couple times in LE's history.

Within the NewOrderAndAuthzsRequest, replace the corepb.Authorization field with a new sapb.NewAuthzRequest message. This message has all of the same field types and numbers, and the RA still populates all of these fields when constructing a request, for backwards compatibility. But it also has new fields (an Identifier carrying both type and value, a list of challenge types, and a challenge token) which the RA preferentially consumes if present. This causes the content of our NewOrderAndAuthzsRequest to more closely match the content that will be created at the database layer. Although this may seem like a step backwards in terms of abstraction, it is also a step forwards in terms of both efficiency (not having to transmit multiple nearly-identical challenge objects) and correctness (being guaranteed that the token is actually identical across all challenges). After this change is deployed, it will be followed by a change which removes the old fields from the NewAuthzRequest message, to realize the efficiency gains. Part of #5913

Add a new "GetAuthorization" method to the RA. This method is very similar to the SA's existing "GetAuthorization2" method, except that it also uses the RA's built-in Policy Authority to filter out any challenges which are currently disabled. In a follow-up change, the WFE will be updated to use this method when retrieving authorizations and challenges for display, so that we can ensure disabled challenges are not presented to ACME clients. Part of #5913

Remove the id, identifierValue, status, and challenges fields from sapb.NewAuthzRequest. These fields were left behind from the previous corepb.Authorization request type, and are now being ignored by the SA. Since the RA is no longer constructing full challenge objects to include in the request, remove pa.ChallengesFor and replace it with the much simpler pa.ChallengeTypesFor. Part of #5913

Have the WFE ask the RA for authorizations, rather than asking the SA directly. This extra layer of indirection allows us to filter out challenges which have been disabled, so that clients don't think they can attempt challenges that we have disabled. Also shuffle the order of challenges within the authz objects rendered by the API. We used to have code which does this at authz creation time, but of course that was completely ineffectual once we stored the challenges as just a bitmap in the database. Update the WFE unit tests to mock RA.GetAuthorization instead of SA.GetAuthorization2. This includes making the mock more accurate, so that (e.g.) valid authorizations contain valid challenges, and the challenges have their correct types (e.g. "http-01" instead of just "http"). Also update the OTel tracing test to account for the new RPC. Part of #5913

aarongable mentioned this issue Jan 27, 2022

Configure a PA inside the SA so challenge types can be disabled there #5908

Closed

aarongable added this to the Sprint 2022-02-15 milestone Feb 16, 2022

aarongable assigned beautifulentropy Feb 16, 2022

aarongable modified the milestones: Sprint 2022-02-15, Sprint 2022-02-22, Sprint 2022-03-01 Feb 22, 2022

beautifulentropy removed their assignment Mar 8, 2022

jsha modified the milestones: Sprint 2022-03-01, Sprint 2022-03-22 Mar 22, 2022

jsha assigned beautifulentropy Mar 22, 2022

aarongable unassigned beautifulentropy May 10, 2022

beautifulentropy mentioned this issue Dec 20, 2023

Consider further refactoring of the Policy package #7228

Open

aarongable mentioned this issue Jan 8, 2024

Correct Comments on Challenge Construction in core/challenges.go #7242

Merged

aarongable mentioned this issue Aug 6, 2024

Improve how we create new authorizations #7643

Merged

aarongable self-assigned this Aug 6, 2024

aarongable modified the milestones: Sprint 2022-03-22, Sprint 2024-08-06 Aug 6, 2024

This was referenced Aug 8, 2024

Remove deprecated sapb.NewAuthzRequest fields #7651

Merged

RA: Add GetAuthorization method to filter disabled challenges #7652

Merged

aarongable mentioned this issue Aug 12, 2024

WFE: Use RA.GetAuthorization to filter out disabled challenges #7659

Merged

aarongable modified the milestones: Sprint 2024-08-06, Sprint 2024-08-13 Aug 13, 2024

aarongable modified the milestones: Sprint 2024-08-13, Sprint 2024-08-20 Aug 20, 2024

aarongable mentioned this issue Aug 23, 2024

Improve how we disable challenge types #7677

Merged

aarongable modified the milestones: Sprint 2024-08-20, Sprint 2024-08-27 Aug 27, 2024

aarongable closed this as completed in #7677 Aug 29, 2024

aarongable closed this as completed in d58d096 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink how we disable challenge types #5913

Rethink how we disable challenge types #5913

aarongable commented Jan 27, 2022

jvanasco commented Jan 28, 2022

aarongable commented Jan 29, 2022

aarongable commented Jan 8, 2024

jvanasco commented Jan 8, 2024

aarongable commented Jan 9, 2024