-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support long CIDs in subdomains by splitting at 63rd char #7358
Conversation
This adds subdomain gateway support for CIDs longer than 63 characters. CID is split after reaching 63 character limit counting from right to left. Requests made with random splits are redirected to canonical split version to ensure every CID gets exactly one Origin. Ref. - https://tools.ietf.org/html/rfc1034#page-7 - #7318 License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me!
I'd strongly suggest renaming a method as per my first comment, but it's not a blocker.
core/corehttp/hostname.go
Outdated
// Check if rootID is a valid CID | ||
if rootCID, err := cid.Decode(rootID); err == nil { | ||
// Do we need to redirect CID to a canonical DNS representation? | ||
canonicalPrefix := toDNSSafePrefix(rootID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safe is... misleading in this context. I'd call the function itself normalizedPrefix
or something. Not suggesting canonicalPrefix
as it implies "both are fine but one is neater" - no, only the normalized
is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified names in a74dada
Addresses PR review + adds explicit tests for what happens when short CID gets split. License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
@Stebalien ready for you to take a look |
for i := len(parts) - 1; i > 0; i-- { | ||
parts[i] = s[(i-1)*dnsLabelMaxLength+firstPartLen : i*dnsLabelMaxLength+firstPartLen] | ||
} | ||
parts[0] = s[:firstPartLen] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you want to maximize the left-most labels? That is, make the right-most labels as small as possible?
We're only over by two base-32 numerals so we can probably get 32**2 = 1024
wildcard certificates. It looks like we can register 100 names per cert, and 50 certs which gives us 5000 certs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to maximize the first part after public suffix, giving the most bits to parent labels, following the spirit of DNS hierarchy.
You have a point, it may be possible to get TLS certs if we hardcode the first split to be after two characters from the right.
My main concern: while technically possible, I am not sure if it is practical / better than using Base36 for ED25519.
Assuming we have all the certs in place, this fix would work only for ED25519 represented in base32, everything else (eg. sha2-512) will still be over the wildcard length limit:
- 👍 ED25519 libp2p-key is only 2 characters over the limit, so it fits in two labels: https://bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5.gk.ipns.dweb.link
bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5
.gk
- 💢 CID created with
--hash sha2-512
will be too long, needs to be split into more than two chunks anyway: https://bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsv.nvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433a.eg.ipfs.dweb.link:bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsv
.nvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433a
.eg
Let's Encrypt limits are (src):
- You can only fit 100 domains onto one certificate
- Each Registered Domain may only appear on 50 certificates per week.
Interesting part is that according to the rate limits they
use the Public Suffix List to calculate the registered domain.
ipfs.dweb.link
and ipns.dweb.link
are both on Public Suffix List, which means they are "excluded" from limit calculation, and the first CID chunk becomes "Registered Domain", effectively removing the weekly limit, which makes things easier.
With that context, I'd appreciate input from @mburns and @MichaelMure on how feasible it is for us, or operator like Infura to get manage >100 certs, each for 100 wildcard names to support TLS on *.aa.dweb.link
, *.ab.dweb.link
... etc, and if it's better than using a single wildcard cert and Base36 encoded IPNS ED25519 names that do not require splitting.
TL;DR ED25519 libp2p-key is two characters over the limit, and we can either do the bulk-certificate hack, or switch those keys to Base36 so they fit in a single label, removing the need to cert hack.
I worry Infra overhead of cert hack is significant, and may artificially slow down adoption of subdomain gateways 😞
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found out about this public suffix "loophole" earlier today as well. It could make things easier, but it's also somewhat abusive and risky (do they even update their list?). There is also the possibility to apply for a raised rate limit but I don't know of feasible that is. Not even talking about the vendor locking with LetsEncrypt.
One problem coming from this solution is also that it pretty much means doing TLS termination manually. AWS for example limit to 25 certificates attached to a load balancer. It compound quickly in complexity.
Sadly we might need to go that way anyway because we need to have a user ID as a subdomain (we need to link requests to a user somehow, basic auth break some usecases, we can't touch the path part of the URL).
In any case, base36 is way way easier for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with using base36. And you're right, this feature is just to support sha512 etc. so it doesn't really matter.
We want to maximize the first part after public suffix, giving the most bits to parent labels, following the spirit of DNS hierarchy.
I'm not sure I follow. Why would we want to maximize that part? Are we worried about cookies/origins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managing that many certs is doable, but feels a bit icky, technically speaking.
all else being equal, sounds like base36 is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want to maximize that part? Are we worried about cookies/origins?
AFAIK in practice, it does not matter, mostly aesthetics.
But if we can pick, I'd maximize "parent" labels on the right, as a precaution.
My rationale: the surface for bugs always exists.
To illustrate: a mild version of "Origin sharing" is a thing. Two sibling subdomains can mutually agree to use parent Origin for cookies (both setting document.domain = aa.ipfs.example
, assuming example.com
is not on Public Suffix List).
Let's say the future brings a bug/vulnerability in Origin-separation code in one of browser vendors. Maximizing parent label makes it harder/infeasible to pull this class of attacks off, as generating parent label if way more difficult if it needs to match 63 instead of 2 chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: seconding what @lidel said
In fact this very thread contains something that walks and talks like a precursor to a phishing attack: the ability to blanket register a star-TLS cert to all possible N-character prefixes when N is short enough at the top.
Before this thread started this was also one of the 1st questions I asked @lidel when I was doing review, because I was dumb and misread the code: "@lidel why do you let the rightmost side be short and thus brute-forceable?"
I do not have a good feeling how to "productize" this into an outright vulnerability, but leaving a "mid-to-top-level" part of DNS trivially forgeable doesn't... smell right at all.
Let's park this for now (closing to reduce confusion), |
This PR adds subdomain gateway support for CIDs longer than 63 characters:
"Splitting logic" is in
toDNSSafePrefix
– would love some eyes on it.I am unsure if I it is the most efficient way to split into chunks of 63 starting from right to left, suggestions welcome.
Ref.