Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore invalid value validating dns name list #69

Merged
merged 4 commits into from
Jun 1, 2023
Merged

Conversation

patricio78
Copy link
Contributor

@patricio78 patricio78 commented May 23, 2023

This fixes #68

@codecov
Copy link

codecov bot commented May 23, 2023

Codecov Report

Merging #69 (c1f0e19) into main (731817a) will not change coverage.
The diff coverage is 100.00%.

❗ Current head c1f0e19 differs from pull request most recent head 73b749e. Consider uploading reports for the commit 73b749e to get more accurate results

@@           Coverage Diff           @@
##             main      #69   +/-   ##
=======================================
  Coverage   95.34%   95.34%           
=======================================
  Files          13       13           
  Lines        2662     2662           
=======================================
  Hits         2538     2538           
  Misses        124      124           
Impacted Files Coverage Δ
src/subject_name/verify.rs 96.18% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly different from how Brian intended it, yet this seems like an improvement to me -- I'm struggling to come up with reasons not to allow this kind of thing. Curious to hear what others think.

@cpu
Copy link
Member

cpu commented May 23, 2023

If we're going to accept this change I think it should also have unit test coverage.

This is clearly different from how Brian intended it, yet this seems like an improvement to me

I'm more skeptical (but could be convinced). My base position is that security sensitive code shouldn't be in the habit of ignoring malformed data when making a trust decision. In general before making a change like this I'd like to see evidence that someone has attempted to address the malformed certificates and failed, and that the malformed certificates are common enough to merit change on our side. I don't think this case has met the bar on either of those fronts and it feels like externalizing bugs from one environment to the world.

@patricio78
Copy link
Contributor Author

If we're going to accept this change I think it should also have unit test coverage.

This is clearly different from how Brian intended it, yet this seems like an improvement to me

I'm more skeptical (but could be convinced). My base position is that security sensitive code shouldn't be in the habit of ignoring malformed data when making a trust decision. In general before making a change like this I'd like to see evidence that someone has attempted to address the malformed certificates and failed, and that the malformed certificates are common enough to merit change on our side. I don't think this case has met the bar on either of those fronts and it feels like externalizing bugs from one environment to the world.

The certificate is correctly validated by TLS implementations in other languages, and it's also accepted by browsers (I've tried firefox, safari and chrome)

@djc
Copy link
Member

djc commented May 24, 2023

I'm more skeptical (but could be convinced). My base position is that security sensitive code shouldn't be in the habit of ignoring malformed data when making a trust decision.

But is it here? We've found a certificate that we were able to parse, and, by virtue of having found a presented identity that matches the expected identity, we're ignoring the other identities. That seems pretty reasonable to me.

In general before making a change like this I'd like to see evidence that someone has attempted to address the malformed certificates and failed, and that the malformed certificates are common enough to merit change on our side. I don't think this case has met the bar on either of those fronts and it feels like externalizing bugs from one environment to the world.

From the original issue (#68):

Have you contacted the service to report their certificate is invalid? What was their response?

They didn't share much, they need it for some specific clients.

In a large distributed system like the web PKI, it's unclear that we can always get issuers to care about these issues -- the OP tested against Firefox (NSS), Safari (probably SecureTransport) and Chrome (presumably BoringSSL) as well as Java, Go and curl (probably defaults to OpenSSL?). I think Postel's law should apply here?

I also looked at the spec for the Subject Alternative Name extension, but unfortunately it is silent on this issue. I browsed all the open issues in Brian's original webpki repo but I couldn't find any issues that talk about instances of this, so presumably it is reasonably rare?

(Some of these issues might have flown under the radar due to the mostly opaque errors used prior to the fork.)

@cpu
Copy link
Member

cpu commented May 24, 2023

I just want to preface my reply by making sure passers by know this kind of discussion is what I hope will happen when we're looking at relaxing webpki's existing logic to allow for malformed data in the real world. I don't want it to be read as an argumentative battle, but rather as a careful weighing of the pros/cons for a type of issue that requires balancing contradictory needs (that's certainly the spirit I'm trying to bring at least!).

We've found a certificate that we were able to parse, and, by virtue of having found a presented identity that matches the expected identity, we're ignoring the other identities.

I'm coming around to the idea. I was able to verify Go takes this approach. I find the mozpkix and chromium code hard to reason about but I'm willing to take the claim they accept these certs at face value.

In a large distributed system like the web PKI, it's unclear that we can always get issuers to care about these issue

I think there's two points to dig into here:

  1. Is this the web PKI? I take that phrase to mean the PKI that involves trusted root certificate authorities participating in browser root programs and the CA/Browser forum. I believe the certificate in question can't have been issued by one of these CAs as it would be a miss-issuance event and demand revocation. I'm a little fuzzy on what our ambitions are for this crate in the post-fork world and maybe we should have a separate issue to discuss that. In general I've tried to emulate Brian's very deliberate and careful consideration of standards tuned to the web PKI in particular when working with this crate. It seems like we might be drifting from that original vision to better support other PKIs and while I'm not opposed to that direction it might be helpful to think ahead of time about how far we're willing to bend.

  2. I agree it's challenging to get issuers to care about these issues and that's why I think you need a demonstrable problem like "this piece of standards compliant software rejects the certificate" to use as a stick. If all software bends to allow each deviation from the specification then there's no incentive for issuers to care. At the minimum I'm arguing we should always try to get the issuers to care before caving. Wearing my optimist hat I think the world is more malleable than it can appear if you have perseverance.

the OP tested against Firefox (NSS), Safari (probably SecureTransport) and Chrome (presumably BoringSSL) as well as Java, Go and curl (probably defaults to OpenSSL?).

It would be nice to have more detail (perhaps reproduction steps and an affected certificate chain?) for this testing. As mentioned above this certificate presumably isn't issued by a well known trust anchor so I think the OP had to either add the issuer to the trust store for each of these pieces of software, or was clicking through/ignoring an unknown issuer warning. In the latter case can we be confident doing that didn't also skip some validation processes relevant to the discussion?

I think Postel's law should apply here?

I subscribe to the line of thinking outlined by M. Thomson in his informational draft on the subject: Postel's law gets you short-term interoperability gains that are outweighed by the long-term damage to the ecosystem and its ability to make forward progress with sensible specification. I think it's been a net-negative for the internet of today.

I also looked at the spec for the Subject Alternative Name extension, but unfortunately it is silent on this issue

I agree 5280 and 6125 are unfortunately silent on how to perform subject name matching in the presence of malformed SANs, but 5280 does say:

When the subjectAltName extension contains a domain name system
label, the domain name MUST be stored in the dNSName (an IA5String).
The name MUST be in the "preferred name syntax", as specified by
[Section 3.5 of [RFC1034]](https://datatracker.ietf.org/doc/html/rfc1034#section-3.5)

The "preferred name syntax" does not allow "{" "}" characters so I don't think it's controversial to call this a malformed certificate with the root cause being an invalid dNSName SAN.

I browsed all the open issues in Brian's original webpki repo but I couldn't find any issues that talk about instances of this, so presumably it is reasonably rare?

Indeed. So if it's rare, and we have only one case from an unnamed issuer of unknown scale does it merit changing webpki for the masses, or should the OP find a solution with the issuer or consider using a fork of webpki to meet their need?

(Some of these issues might have flown under the radar due to the mostly opaque errors used prior to the fork.)

Agreed, and I think that's one of the risks of accepting a patch like this. It will lead to more malformed certificates persisting in the world unnoticed until the next piece of software tries to raise the bar on standards compliance and has to weigh the same pros/cons again, but with the addition of "and Rustls/webpki accept it too" :-)

@cpu
Copy link
Member

cpu commented May 24, 2023

Commenting separately with some immediately actionable thoughts since I know I've just spat out a wall-of-text 😅

If folks aren't swayed by my reluctance and we do want to change webpki then I would like to see:

  1. Test coverage added to the PR. @patricio78 are you willing to implement that?
  2. More detail on how testing was done with other software to demonstrate that the certificate was accepted. Can you share your testing steps, the versions of the software tested, and the certificate chain you used in testing?
  3. The webpki diff changed to only continue iteration for the specific malformed name error that's happening here. I took a look at the code in question and I don't think there are any other errors that could be emitted here but I think we should fail-closed and only proceed for the specific case we're litigating: malformed subject names.

Having those things done would help me feel better about this direction.

@djc
Copy link
Member

djc commented May 24, 2023

I just want to preface my reply by making sure passers by know this kind of discussion is what I hope will happen when we're looking at relaxing webpki's existing logic to allow for malformed data in the real world. I don't want it to be read as an argumentative battle, but rather as a careful weighing of the pros/cons for a type of issue that requires balancing contradictory needs (that's certainly the spirit I'm trying to bring at least!).

Good to make that explicit -- I hope you (and others) also took my comments in that spirit.

I think there's two points to dig into here:

1. Is this the web PKI? I take that phrase to mean the PKI that involves trusted root certificate authorities participating in browser root programs and the CA/Browser forum. I believe the certificate in question can't have been issued by one of these CAs as it would be a miss-issuance event and demand revocation.

Why do you think so, solely because this violates the letter of 5280, as you outlined? If the browsers accept it, doesn't that indicate that the likes of the CA/Browser forum doesn't consider this a (big enough) issue?

I'm a little fuzzy on what our ambitions are for this crate in the post-fork world and maybe we should have a separate issue to discuss that. In general I've tried to emulate Brian's very deliberate and careful consideration of standards tuned to the web PKI in particular when working with this crate. It seems like we might be drifting from that original vision to better support other PKIs and while I'm not opposed to that direction it might be helpful to think ahead of time about how far we're willing to bend.

There is clearly pressure from an audience of Rust users doing non-web stuff who would like to be able to use rustls rather than a non-memory safe TLS stack. As such, I would like ways to accomodate that. On the other hand, we should definitely restrict (to the extent possible) the default uses to be tuned to what I expect is the vast majority of downstream use, HTTP (and other protocols) with a relatively small, stable set of large certificate authorities. But you're right that this probably warrants a separate discussion.

2. I agree it's challenging to get issuers to care about these issues and that's why I think you need a demonstrable problem like "this piece of standards compliant software rejects the certificate" to use as a stick. If all software bends to allow each deviation from the specification then there's no incentive for issuers to care. At the minimum I'm arguing we should always _try_ to get the issuers to care before caving. Wearing my optimist hat I think the world is more malleable than it can appear if you have perseverance.

To be clear, I fully agree. To make the other side of this argument explicit: in the grand scheme of things, rustls is a much less popular TLS stack and a lack of support of edge cases that other stacks do support is likely to have a detrimental effect on the uptake of our work. (See also Firefox vs Chrome.)

I think Postel's law should apply here?

I subscribe to the line of thinking outlined by M. Thomson in his informational draft on the subject: Postel's law gets you short-term interoperability gains that are outweighed by the long-term damage to the ecosystem and its ability to make forward progress with sensible specification. I think it's been a net-negative for the internet of today.

To be clear, I'm usually more aligned with Thomson's side. But, with rustls starting out late and competing with much larger resources, we might have to grow our audience before we get to influence issuers.

I browsed all the open issues in Brian's original webpki repo but I couldn't find any issues that talk about instances of this, so presumably it is reasonably rare?

Indeed. So if it's rare, and we have only one case from an unnamed issuer of unknown scale does it merit changing webpki for the masses, or should the OP find a solution with the issuer or consider using a fork of webpki to meet their need?

If we decide not to take this or a similar change, we should make sure that we bubble up a very specific error so that we might get a better understanding of the incidence of this kind of problem. I think forking webpki is generally too big of an ask -- I've previously argued (although that was on the rustls side of the fence, which maybe makes a difference?) in favor of a more modular API so that we can support a larger audience.

  1. The webpki diff changed to only continue iteration for the specific malformed name error that's happening here. I took a look at the code in question and I don't think there are any other errors that could be emitted here but I think we should fail-closed and only proceed for the specific case we're litigating: malformed subject names.

Agreed that this would be an improvement.

@cpu
Copy link
Member

cpu commented May 24, 2023

Good to make that explicit -- I hope you (and others) also took my comments in that spirit.

For sure :-) I definitely did and just wanted to make sure it was explicit since I know as response length increases it can become fuzzy.

  1. Is this the web PKI? ...
    If the browsers accept it, doesn't that indicate that the likes of the CA/Browser forum doesn't consider this a (big enough) issue?

I'm confident that if a trusted CA in the Mozilla/Microsoft/Apple root programs issued such a certificate it would be considered a miss-issuance and subject to the rules that govern such. I think that's separate from whether the browser will accept the certificate. The first example I could find was https://bugzilla.mozilla.org/show_bug.cgi?id=1535869 but I'm sure there are many more. We can also look at incidents like https://bugzilla.mozilla.org/show_bug.cgi?id=1715455 to see there are certificates accepted by browsers that are none the less forbidden by the same browser's root program rules and deemed miss-issued.

In short I consider the "web PKI" to be the set of certificates that chain to a trust anchor in-scope for the root program for the browser in use. If you're adding a trust anchor to the browser then you're relying on an independent PKI separate from the web PKI, even if you're still using a "web" browser.

There is clearly pressure from an audience of Rust users doing non-web stuff who would like to be able to use rustls rather than a non-memory safe TLS stack. As such, I would like ways to accomodate that. On the other hand, we should definitely restrict (to the extent possible) the default uses to be tuned to what I expect is the vast majority of downstream use, HTTP (and other protocols) with a relatively small, stable set of large certificate authorities. But you're right that this probably warrants a separate discussion.

Agreed, I don't want to lose the forest for the trees. Leaving users on memory unsafe software simply because the alternative is too tightly tuned to standards compliance to be usable for their needs isn't a great outcome for anyone. Let's try to get a discussion started on how we might be able to have webpki enforce strong standards compliance for certificate chains leading to a trust anchor we know is supposed to operate according to that compliance vs a trust anchor the user supplied (presumably knowing its subject to considerably less oversight).

If we decide not to take this or a similar change, we should make sure that we bubble up a very specific error so that we might get a better understanding of the incidence of this kind of problem.

I think we're doing an OK job of that for this case as of 9f0f95a but it's not yet in a released version.

I've previously argued (although that was on the rustls side of the fence, which maybe makes a difference?) in favor of a more modular API so that we can support a larger audience.

That sounds like an idea that would help here. We're sort of looking at two worlds: one that is supposed to be operating according to published standards and is punished by the ecosystem for divergence and one that is basically a free for all with minimal to no incentives to change. We're straining to accommodate both with the same API.

Agreed that this would be an improvement.

👍

If the changes I suggested in my other comment were implemented I'd be willing to +1.

I appreciate you giving counter-balance here & arguing for the user impact. Coming from a background where detecting and avoiding miss-issuance based on standards compliance was critical to maintaining trusted status leaves me with a bias that could probably use fine tuning for the Rustls context.

@djc
Copy link
Member

djc commented May 24, 2023

That sounds like an idea that would help here. We're sort of looking at two worlds: one that is supposed to be operating according to published standards and is punished by the ecosystem for divergence and one that is basically a free for all with minimal to no incentives to change. We're straining to accommodate both with the same API.

Right. Ideally we'd put up some roadblocks (like Cargo features that mention dangerous or hazmat or whatever) that introduce some inconvenience to keep users on the beaten/correct/popular path but still make it possible to use the 90% that they want without the 10% that is problematic for them.

I appreciate you giving counter-balance here & arguing for the user impact. Coming from a background where detecting and avoiding miss-issuance based on standards compliance was critical to maintaining trusted status leaves me with a bias that could probably use fine tuning for the Rustls context.

It's certainly useful to have both perspectives -- adversarial thinking is not my strong suit even though it's highly valuable when working on things like rustls and webpki.

@patricio78
Copy link
Contributor Author

patricio78 commented May 24, 2023

Commenting separately with some immediately actionable thoughts since I know I've just spat out a wall-of-text 😅

If folks aren't swayed by my reluctance and we do want to change webpki then I would like to see:

  1. Test coverage added to the PR. @patricio78 are you willing to implement that?

I can take care of adding tests.

  1. More detail on how testing was done with other software to demonstrate that the certificate was accepted. Can you share your testing steps, the versions of the software tested, and the certificate chain you used in testing?

You won't be able to try the server because it requires mTLS, so I reproduced it creating a certificate with invalid SAN using openssl 3.1.0 (it allowed me to do so).
I've built a simple client using go 1.20.1 and crypto/tls, providing the ca and it works fine.
However, an important correction, curl is failing:

curl -v --cacert ./ca.pem https://localhost:4443
[CONN-0-0][CF-SSL] (304) (OUT), TLS handshake, Client hello (1):

  • [CONN-0-0][CF-SSL] (304) (IN), TLS handshake, Server hello (2):
  • [CONN-0-0][CF-SSL] (304) (IN), TLS handshake, Unknown (8):
  • [CONN-0-0][CF-SSL] (304) (IN), TLS handshake, Certificate (11):
  • SSL certificate problem: unsupported or invalid name syntax
  • Closing connection 0
    curl: (60) SSL certificate problem: unsupported or invalid name syntax
    More details here: https://curl.se/docs/sslcerts.html

The error clearly points out about invalid name syntax.

I've attached some resources to reproduce it:
resources.zip

One thing to notice is that if I invert the DNS name and the matching one comes before the invalid name, the validation succeeds for rustls. However, curl fails in any case, so it seems that it's validating all names before matching.
I'm using curl 7.87.0 compiled with LibreSSL/3.3.6

  1. The webpki diff changed to only continue iteration for the specific malformed name error that's happening here. I took a look at the code in question and I don't think there are any other errors that could be emitted here but I think we should fail-closed and only proceed for the specific case we're litigating: malformed subject names.

Having those things done would help me feel better about this direction.

@djc
Copy link
Member

djc commented May 25, 2023

One thing to notice is that if I invert the DNS name and the matching one comes before the invalid name, the validation succeeds for rustls.

That's a fun little nugget! So we could also go be like curl and fail even harder than we're currently doing now...

src/subject_name/verify.rs Outdated Show resolved Hide resolved
@patricio78 patricio78 requested a review from cpu May 31, 2023 00:06
Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Comment on lines 45 to 46
Ok(false) => (),
Err(Error::MalformedDnsIdentifier) => (),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: could consider merging the Ok(false) | Err(Error::MalformedDnsIdentifier) arms.

@djc
Copy link
Member

djc commented May 31, 2023

@cpu do you think we should make a follow-up change to make sure that our result for validation becomes independent of the order in which SANs appear?

@@ -513,7 +513,14 @@ def name_constraints() -> None:
)
],
)


generate_name_constraints_test(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some lint issues in the Python code, please check the CI results.

src/subject_name/verify.rs Outdated Show resolved Hide resolved
tests/generate.py Show resolved Hide resolved
@cpu
Copy link
Member

cpu commented May 31, 2023

@cpu do you think we should make a follow-up change to make sure that our result for validation becomes independent of the order in which SANs appear?

@djc I'm not sure I understand what you have in mind 😅 Can you elaborate a bit?

@djc
Copy link
Member

djc commented May 31, 2023

@cpu do you think we should make a follow-up change to make sure that our result for validation becomes independent of the order in which SANs appear?

@djc I'm not sure I understand what you have in mind 😅 Can you elaborate a bit?

Currently we yield Ok(true) as soon as we find a matching SAN. Given your earlier reasoning about failing closed on arguably invalid certificates, should we make sure to always iterate all SANs before yielding either a positive or negative result?

Downsides for iterating all: performance, certificates with lots of SANs would take a bit longer to verify. Upsides: more consistent verification results, no way to "work around" validation failures by reordering SANs.

@ctz
Copy link
Member

ctz commented May 31, 2023

I find it surprising that this sort of SAN validation is happening during name matching. Seems like that is quite late. Shouldn't it be impossible to end up with a EndEntityCert that has syntactically invalid fields?

@cpu
Copy link
Member

cpu commented May 31, 2023

should we make sure to always iterate all SANs before yielding either a positive or negative result?

Aha, I understand now. I'm open to that.

I find it surprising that this sort of SAN validation is happening during name matching. Seems like that is quite late. Shouldn't it be impossible to end up with a EndEntityCert that has syntactically invalid fields?

The existing webpki code defers this sort of thing in a few places. The cert.rs processing digs out a handful of fields that are left as untrusted::Input and not processed further until use.

This is also something I noticed in my WIP CRL work: I chose to deliberately iterate the revoked certs in the CRL at CRL-parse time and to reject the CRL if any entries were invalid. We could do the same thing here at the cost of some up-front processing, but it seems like based on discussion in this ticket we want to ignore the malformed entries anyway, whether we find them up-front or at time of validation.

Copy link
Member

@cpu cpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you rebase this to resolve conflicts (and to fix the Python lint errs) I think my outstanding feedback can be resolved separately. Thanks!

@cpu
Copy link
Member

cpu commented Jun 1, 2023

@patricio78 Thanks for applying some changes. It looks like your last push is still failing the Python linting. Can you run your updated generate.py through Black? That should resolve the problem.

@djc djc merged commit cdebb3c into rustls:main Jun 1, 2023
@cpu
Copy link
Member

cpu commented Jun 1, 2023

Thanks for seeing this one through the discussion + iteration @patricio78. Appreciate the effort :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fail to validate a server cert with invalid DNS SAN names, but having a matching one
4 participants