Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink how we disable challenge types #5913

Closed
aarongable opened this issue Jan 27, 2022 · 5 comments · Fixed by #7677
Closed

Rethink how we disable challenge types #5913

aarongable opened this issue Jan 27, 2022 · 5 comments · Fixed by #7677
Assignees

Comments

@aarongable
Copy link
Contributor

Currently we list all available validation methods in the RA's config file. When we want to disable a challenge type, we remove it from the config file and do a config deploy. So far, so good.

Disabling a challenge type today means two things happen:

  1. When a client tries to validate a pre-existing authorization (that does have the disabled challenge type, because it was created before the challenge type was disabled), we return an error:

    boulder/ra/ra.go

    Lines 1563 to 1566 in fffd05e

    // This challenge type may have been disabled since the challenge was created.
    if !ra.PA.ChallengeTypeEnabled(ch.Type) {
    return nil, berrors.MalformedError("challenge type %q no longer allowed", ch.Type)
    }
  2. When a client creates a new order, and we create new authorizations for that order, those authorizations do not have the disabled challenge type associated with them:

    boulder/policy/pa.go

    Lines 563 to 574 in fffd05e

    // Otherwise we collect up challenges based on what is enabled.
    if pa.ChallengeTypeEnabled(core.ChallengeTypeHTTP01) {
    challenges = append(challenges, core.HTTPChallenge01(token))
    }
    if pa.ChallengeTypeEnabled(core.ChallengeTypeTLSALPN01) {
    challenges = append(challenges, core.TLSALPNChallenge01(token))
    }
    if pa.ChallengeTypeEnabled(core.ChallengeTypeDNS01) {
    challenges = append(challenges, core.DNSChallenge01(token))
    }

The first of these makes sense. When we disable a challenge type, it's probably because we think something is wrong with it, and so we don't want anyone to continue using the potentially-buggy validation path.

The latter of these makes less sense. While it is nice to not include the disabled challenge type in the first place, so that clients don't mistakenly believe that it is an option and get an error when they try to fulfill it, it also means that clients can't ever use that challenge type to validate that authorization, even after we re-enable the challenge type.

And the two effects above miss one critical case: if we believe that a validations performed using a specific challenge type may have been buggy, then we should not allow certificates to be issued based on those validations. But today, the following sequence of events is possible:

  1. Client creates a new order and new authorizations for that order
  2. Client fulfills those authorizations using a given challenge type
  3. We disable that challenge type
  4. Client finalizes their order and we issue the certificate

We should rethink the challenge enable/disable mechanism, in order to not rely on disabled authorizations while the challenge type is disabled, and in order to allow clients to smoothly begin using the challenge type again when it is re-enabled.

@jvanasco
Copy link
Contributor

A few thoughts on the above.

  1. It reads like there may be some benefit in extend the current toggle of "enabled/disabled" to a triplet of "enabled/disabled/temporary-disabled", in which a "temporary-disabled" challenge is expected to be re-enabled in the near future. Within that framework, for those "temporary-disabled" challenges, they could still be added to the authorizations, but they would not be rendered as an option for the order/auth/challenge, and their endpoints would cause failures.

  2. A tangential issue is "what happens to pending authorizations when a new challenge type is introduced?"

  3. "But today, the following sequence of events is possible". Wasn't that solved by invalidating all the authorizations, or did I dream that happened? If that didn't happen, wouldn't it necessitate another round of revocations?

  4. I've handled similar situations to the "buggy validations" concept described by tracking a version-id and enforcing a minimum version. For example, assuming the TLS-ALPN-01 validator was at v1, the hotfix would bump the required minimum validation to v2, and the full-fix/re-enable would bump the version in the validator to v2. Just an idea here, but "finalize" could ensure compliance before issuance, and if any non-compliant authorizations were detected, those authorization could be immediately invalidated - causing the order to fail. Most clients should gracefully recover by requesting a new order, which would then only create the necessary new authorizations.

@aarongable
Copy link
Contributor Author

  1. "But today, the following sequence of events is possible". Wasn't that solved by invalidating all the authorizations, or did I dream that happened? If that didn't happen, wouldn't it necessitate another round of revocations?

Correct, it was solved by revoking all affected authorizations. But that required manual intervention and additional steps during an incident. We shouldn't need to do that in the future.

@aarongable
Copy link
Contributor Author

Since I've been thinking about this for other reasons (see bug and PR linked above), here's my proposal for how to approach this.

Today, we have one challenge-filtering mechanism: pa.ChallengesFor produces a list of challenges filtered both by which challenges are enabled in the config, and which challenge types are appropriate for the identifier. Conflating these two kinds of filtering leads to the issues described above.

Instead, we should have two wholly separate challenge-filtering mechanisms:

  • The first, producing a list of challenges based on the identifier, should happen only when an Authorization is being created, before being written to the database for the first time. This means that all authzs will have the full suite of challenges theoretically available to them in the database at all times, making them resilient to challenge types being temporarily disabled.
  • The second filter, narrowing down a set of challenges to just those which are enabled, should happen only when reading Authorization objects from the database. This will ensure that disabling a challenge type prevents all issuance which was based on validations conducted via that challenge, by removing it from authorizations as they're checked. This will require just one special case and a new error message when an authorization stored in the database as "valid" can be returned to the user as "invalid" because its only validated challenge has been disabled.

@jvanasco
Copy link
Contributor

jvanasco commented Jan 8, 2024

This seems like an elegant solution.

I think the second filter proposal works perfect for a client interacting with Order objects. The small concern I have is for interacting with Authorization objects during the flow. The concern I have on this is over compatibility with existing clients.

Would the challenge not be in the payload due to the filter, or would it be in the payload but listed as invalid, or would it be in the payload with no status? I believe only non-pending authz show a status, the others just show a type/url/token in the spec.

Imagine there is a bug with HTTP-01 and it's disabled after an order is made but before challenges are completed. If a subscriber iterates the authz, not seeing the http-01 challenge may cause an error or lockup in their client as it does not know how to act. Seeing a disabled http-01 challenge may let them try to complete it, which should then trigger the error code about the challenge being disabled.

@aarongable
Copy link
Contributor Author

The challenge would be omitted from the authz payload, just like the HTTP-01 and TLS-ALPN-01 challenges are omitted from wildcard authz payloads. I don't think this is likely to break clients in unexpected ways, since it's what happens when we disable a challenge type today, and unfortunately we've had to do that a couple times in LE's history.

@aarongable aarongable self-assigned this Aug 6, 2024
aarongable added a commit that referenced this issue Aug 8, 2024
Within the NewOrderAndAuthzsRequest, replace the corepb.Authorization
field with a new sapb.NewAuthzRequest message. This message has all of
the same field types and numbers, and the RA still populates all of
these fields when constructing a request, for backwards compatibility.
But it also has new fields (an Identifier carrying both type and value,
a list of challenge types, and a challenge token) which the RA
preferentially consumes if present.

This causes the content of our NewOrderAndAuthzsRequest to more closely
match the content that will be created at the database layer. Although
this may seem like a step backwards in terms of abstraction, it is also
a step forwards in terms of both efficiency (not having to transmit
multiple nearly-identical challenge objects) and correctness (being
guaranteed that the token is actually identical across all challenges).

After this change is deployed, it will be followed by a change which
removes the old fields from the NewAuthzRequest message, to realize the
efficiency gains.

Part of #5913
aarongable added a commit that referenced this issue Aug 12, 2024
Add a new "GetAuthorization" method to the RA. This method is very
similar to the SA's existing "GetAuthorization2" method, except that it
also uses the RA's built-in Policy Authority to filter out any
challenges which are currently disabled.

In a follow-up change, the WFE will be updated to use this method when
retrieving authorizations and challenges for display, so that we can
ensure disabled challenges are not presented to ACME clients.

Part of #5913
aarongable added a commit that referenced this issue Aug 15, 2024
Remove the id, identifierValue, status, and challenges fields from
sapb.NewAuthzRequest. These fields were left behind from the previous
corepb.Authorization request type, and are now being ignored by the SA.

Since the RA is no longer constructing full challenge objects to include
in the request, remove pa.ChallengesFor and replace it with the much
simpler pa.ChallengeTypesFor.

Part of #5913
aarongable added a commit that referenced this issue Aug 22, 2024
Have the WFE ask the RA for authorizations, rather than asking the SA
directly. This extra layer of indirection allows us to filter out
challenges which have been disabled, so that clients don't think they
can attempt challenges that we have disabled.

Also shuffle the order of challenges within the authz objects rendered
by the API. We used to have code which does this at authz creation time,
but of course that was completely ineffectual once we stored the
challenges as just a bitmap in the database.

Update the WFE unit tests to mock RA.GetAuthorization instead of
SA.GetAuthorization2. This includes making the mock more accurate, so
that (e.g.) valid authorizations contain valid challenges, and the
challenges have their correct types (e.g. "http-01" instead of just
"http"). Also update the OTel tracing test to account for the new RPC.

Part of #5913
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants