Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support long CIDs in subdomains by splitting at 63rd char #7358

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 63 additions & 7 deletions core/corehttp/hostname.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ var defaultKnownGateways = map[string]config.GatewaySpec{
"dweb.link": subdomainGatewaySpec,
}

// Label's max length in DNS (https://tools.ietf.org/html/rfc1034#page-7)
const dnsLabelMaxLength int = 63

// HostnameOption rewrites an incoming request based on the Host header.
func HostnameOption() ServeOption {
return func(n *core.IpfsNode, _ net.Listener, mux *http.ServeMux) (*http.ServeMux, error) {
Expand Down Expand Up @@ -151,16 +154,29 @@ func HostnameOption() ServeOption {
return
}

// Do we need to fix multicodec in PeerID represented as CIDv1?
if isPeerIDNamespace(ns) {
keyCid, err := cid.Decode(rootID)
if err == nil && keyCid.Type() != cid.Libp2pKey {
// Check if rootID is a valid CID
if rootCID, err := cid.Decode(rootID); err == nil {
// Do we need to redirect CID to a canonical DNS representation?
hostPrefix := toDNSPrefix(rootID)
if !strings.HasPrefix(r.Host, hostPrefix) {
if newURL, ok := toSubdomainURL(hostname, pathPrefix+r.URL.Path, r); ok {
// Redirect to CID fixed inside of toSubdomainURL()
// Redirect to CID split split at deterministic places
// to ensure CID always gets the same Origin on the web
http.Redirect(w, r, newURL, http.StatusMovedPermanently)
return
}
}

// Do we need to fix multicodec in PeerID represented as CIDv1?
if isPeerIDNamespace(ns) {
if rootCID.Type() != cid.Libp2pKey {
if newURL, ok := toSubdomainURL(hostname, pathPrefix+r.URL.Path, r); ok {
// Redirect to CID fixed inside of toSubdomainURL()
http.Redirect(w, r, newURL, http.StatusMovedPermanently)
return
}
}
}
}

// Rewrite the path to not use subdomains
Expand Down Expand Up @@ -226,8 +242,16 @@ func knownSubdomainDetails(hostname string, knownGateways map[string]config.Gate
break
}

// Merge remaining labels (could be a FQDN with DNSLink)
rootID := strings.Join(labels[:i-1], ".")
idLabels := labels[:i-1]
// Merge remaining DNS labels and see if it is a CID or something else
// (DNS-friendly text representation splits CID to fit each chunk in 63 characters)
rootID := strings.Join(idLabels, "")
if _, err := cid.Decode(rootID); err != nil {
// Not a CID:
// Return rootID in original form, separated with '.'
// (mostly used by FQDNs with DNSLink)
rootID = strings.Join(idLabels, ".")
}
return gw, fqdn, ns, rootID, true
}
// not a known subdomain gateway
Expand Down Expand Up @@ -266,6 +290,37 @@ func isPeerIDNamespace(ns string) bool {
}
}

// Converts an identifier to DNS-safe representation
func toDNSPrefix(id string) (prefix string) {
s := strings.Replace(id, ".", "", -1) // remove separators if present

// Return if things fit after dot removal
if len(s) <= dnsLabelMaxLength {
return s
}

parts := make(
[]string,
// same as ceil( len(s) / dnsLabelMaxLength )
(len(s)+dnsLabelMaxLength-1)/dnsLabelMaxLength,
)

firstPartLen := len(s) % dnsLabelMaxLength

// if it divides by 63 perfectly - full part
if firstPartLen == 0 {
firstPartLen = dnsLabelMaxLength
}

// Iterate from right to left to maximize length of right-most labels
for i := len(parts) - 1; i > 0; i-- {
parts[i] = s[(i-1)*dnsLabelMaxLength+firstPartLen : i*dnsLabelMaxLength+firstPartLen]
}
parts[0] = s[:firstPartLen]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't you want to maximize the left-most labels? That is, make the right-most labels as small as possible?

We're only over by two base-32 numerals so we can probably get 32**2 = 1024 wildcard certificates. It looks like we can register 100 names per cert, and 50 certs which gives us 5000 certs.

Copy link
Member Author

@lidel lidel May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to maximize the first part after public suffix, giving the most bits to parent labels, following the spirit of DNS hierarchy.


You have a point, it may be possible to get TLS certs if we hardcode the first split to be after two characters from the right.

My main concern: while technically possible, I am not sure if it is practical / better than using Base36 for ED25519.

Assuming we have all the certs in place, this fix would work only for ED25519 represented in base32, everything else (eg. sha2-512) will still be over the wildcard length limit:

Let's Encrypt limits are (src):

  • You can only fit 100 domains onto one certificate
  • Each Registered Domain may only appear on 50 certificates per week.

Interesting part is that according to the rate limits they

use the Public Suffix List to calculate the registered domain.

ipfs.dweb.link and ipns.dweb.link are both on Public Suffix List, which means they are "excluded" from limit calculation, and the first CID chunk becomes "Registered Domain", effectively removing the weekly limit, which makes things easier.


With that context, I'd appreciate input from @mburns and @MichaelMure on how feasible it is for us, or operator like Infura to get manage >100 certs, each for 100 wildcard names to support TLS on *.aa.dweb.link, *.ab.dweb.link ... etc, and if it's better than using a single wildcard cert and Base36 encoded IPNS ED25519 names that do not require splitting.

TL;DR ED25519 libp2p-key is two characters over the limit, and we can either do the bulk-certificate hack, or switch those keys to Base36 so they fit in a single label, removing the need to cert hack.

I worry Infra overhead of cert hack is significant, and may artificially slow down adoption of subdomain gateways 😞

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out about this public suffix "loophole" earlier today as well. It could make things easier, but it's also somewhat abusive and risky (do they even update their list?). There is also the possibility to apply for a raised rate limit but I don't know of feasible that is. Not even talking about the vendor locking with LetsEncrypt.

One problem coming from this solution is also that it pretty much means doing TLS termination manually. AWS for example limit to 25 certificates attached to a load balancer. It compound quickly in complexity.

Sadly we might need to go that way anyway because we need to have a user ID as a subdomain (we need to link requests to a user somehow, basic auth break some usecases, we can't touch the path part of the URL).

In any case, base36 is way way easier for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with using base36. And you're right, this feature is just to support sha512 etc. so it doesn't really matter.

We want to maximize the first part after public suffix, giving the most bits to parent labels, following the spirit of DNS hierarchy.

I'm not sure I follow. Why would we want to maximize that part? Are we worried about cookies/origins?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managing that many certs is doable, but feels a bit icky, technically speaking.

all else being equal, sounds like base36 is preferable.

Copy link
Member Author

@lidel lidel May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to maximize that part? Are we worried about cookies/origins?

AFAIK in practice, it does not matter, mostly aesthetics.
But if we can pick, I'd maximize "parent" labels on the right, as a precaution.

My rationale: the surface for bugs always exists.
To illustrate: a mild version of "Origin sharing" is a thing. Two sibling subdomains can mutually agree to use parent Origin for cookies (both setting document.domain = aa.ipfs.example, assuming example.com is not on Public Suffix List).

Let's say the future brings a bug/vulnerability in Origin-separation code in one of browser vendors. Maximizing parent label makes it harder/infeasible to pull this class of attacks off, as generating parent label if way more difficult if it needs to match 63 instead of 2 chars.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: seconding what @lidel said

In fact this very thread contains something that walks and talks like a precursor to a phishing attack: the ability to blanket register a star-TLS cert to all possible N-character prefixes when N is short enough at the top.

Before this thread started this was also one of the 1st questions I asked @lidel when I was doing review, because I was dumb and misread the code: "@lidel why do you let the rightmost side be short and thus brute-forceable?"

I do not have a good feeling how to "productize" this into an outright vulnerability, but leaving a "mid-to-top-level" part of DNS trivially forgeable doesn't... smell right at all.


return strings.Join(parts, ".")
}

// Converts a hostname/path to a subdomain-based URL, if applicable.
func toSubdomainURL(hostname, path string, r *http.Request) (redirURL string, ok bool) {
var scheme, ns, rootID, rest string
Expand Down Expand Up @@ -340,6 +395,7 @@ func toSubdomainURL(hostname, path string, r *http.Request) (redirURL string, ok
// produce a subdomain URL
return "", false
}
rootID = toDNSPrefix(rootID)
}

return safeRedirectURL(fmt.Sprintf(
Expand Down
37 changes: 36 additions & 1 deletion core/corehttp/hostname_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,13 @@ func TestToSubdomainURL(t *testing.T) {
{"localhost:8080", "/ipns/dnslink.io", "http://dnslink.io.ipns.localhost:8080/", true},
// CIDv0 → CIDv1base32
{"localhost", "/ipfs/QmbCMUZw6JFeZ7Wp9jkzbye3Fzp2GGcPgC3nmeUjfVF87n", "http://bafybeif7a7gdklt6hodwdrmwmxnhksctcuav6lfxlcyfz4khzl3qfmvcgu.ipfs.localhost/", true},
// CIDv1 with long sha512 (requires DNS label length workaround)
{"localhost", "/ipfs/bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg", "http://bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvy.w764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg.ipfs.localhost/", true},
// PeerID as CIDv1 needs to have libp2p-key multicodec
{"localhost", "/ipns/QmY3hE8xgFCjGcz6PHgnvJz5HZi1BaKRfPkn1ghZUcYMjD", "http://bafzbeieqhtl2l3mrszjnhv6hf2iloiitsx7mexiolcnywnbcrzkqxwslja.ipns.localhost/", true},
{"localhost", "/ipns/bafybeickencdqw37dpz3ha36ewrh4undfjt2do52chtcky4rxkj447qhdm", "http://bafzbeickencdqw37dpz3ha36ewrh4undfjt2do52chtcky4rxkj447qhdm.ipns.localhost/", true},
// PeerID: ed25519+identity multihash
{"localhost", "/ipns/12D3KooWFB51PRY9BxcXSH6khFXw1BZeszeLDy7C8GciskqCTZn5", "http://bafzaajaiaejcat4yhiwnr2qz73mtu6vrnj2krxlpfoa3wo2pllfi37quorgwh2jw.ipns.localhost/", true},
{"localhost", "/ipns/12D3KooWFB51PRY9BxcXSH6khFXw1BZeszeLDy7C8GciskqCTZn5", "http://ba.fzaajaiaejcat4yhiwnr2qz73mtu6vrnj2krxlpfoa3wo2pllfi37quorgwh2jw.ipns.localhost/", true},
} {
url, ok := toSubdomainURL(test.hostname, test.path, r)
if ok != test.ok || url != test.url {
Expand Down Expand Up @@ -75,6 +77,28 @@ func TestPortStripping(t *testing.T) {

}

func TestDNSPrefix(t *testing.T) {
for _, test := range []struct {
in string
out string
}{
// <= 63
{"bafybeickencdqw37dpz3ha36ewrh4undfjt2do52chtcky4rxkj447qhdm", "bafybeickencdqw37dpz3ha36ewrh4undfjt2do52chtcky4rxkj447qhdm"},
{"bafy.beickencdqw37dpz3ha36ewrh4undfjt2do52chtcky4rxkj447qhdm", "bafybeickencdqw37dpz3ha36ewrh4undfjt2do52chtcky4rxkj447qhdm"},
// > 63
{"bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk", "ba.fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk"},
{"bafzaajaiaejca4syrpdu6g.dx4wsdnokxkprgzxf4wrs.tuc34gxw5k5jrag2so5gk", "ba.fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk"},
{"bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg", "bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvy.w764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg"},
{"bafkrgqe3ohjcjplc6n4f3fw.unlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4.ffyyxnayrtdi5oc4xb2332g645433aeg", "bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvy.w764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg"},
} {
out := toDNSPrefix(test.in)
if out != test.out {
t.Errorf("(%s): returned '%s', expected '%s'", test.in, out, test.out)
}
}

}

func TestKnownSubdomainDetails(t *testing.T) {
gwSpec := config.GatewaySpec{
UseSubdomains: true,
Expand Down Expand Up @@ -127,6 +151,17 @@ func TestKnownSubdomainDetails(t *testing.T) {
{"foo.dweb.ipfs.pvt.k12.ma.us", "", "", "", false},
{"bafkreicysg23kiwv34eg2d7qweipxwosdo2py4ldv42nbauguluen5v6am.ipfs.dweb.ipfs.pvt.k12.ma.us", "dweb.ipfs.pvt.k12.ma.us", "ipfs", "bafkreicysg23kiwv34eg2d7qweipxwosdo2py4ldv42nbauguluen5v6am", true},
{"bafzbeihe35nmjqar22thmxsnlsgxppd66pseq6tscs4mo25y55juhh6bju.ipns.dweb.ipfs.pvt.k12.ma.us", "dweb.ipfs.pvt.k12.ma.us", "ipns", "bafzbeihe35nmjqar22thmxsnlsgxppd66pseq6tscs4mo25y55juhh6bju", true},
// edge case check: understand split CIDs (workaround for 63 character limit of a single DNS label https://github.com/ipfs/go-ipfs/issues/7318)
// Note: canonical split is at 63, but we support arbitrary splits for improved UX
// Short CID (eg. unnecessarily split by user)
{"baf.kreicysg23kiwv34eg2d7.qweipxwosdo2py4ldv4.2nbauguluen5v6am.ipfs.dweb.link", "dweb.link", "ipfs", "bafkreicysg23kiwv34eg2d7qweipxwosdo2py4ldv42nbauguluen5v6am", true},
// ED25519 libp2p-key
{"ba.fzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipfs.dweb.link", "dweb.link", "ipfs", "bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk", true},
{"bafzaajaiaejca4syrpdu6gdx4wsdnok.xkprgzxf4wrstuc34gxw5k5jrag2so5gk.ipfs.dweb.link", "dweb.link", "ipfs", "bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk", true},
{"bafzaajaiaejca4sy.rpdu6gdx4wsdnok.xkprgzxf4wrstuc34g.xw5k5jrag2so5gk.ipfs.dweb.link", "dweb.link", "ipfs", "bafzaajaiaejca4syrpdu6gdx4wsdnokxkprgzxf4wrstuc34gxw5k5jrag2so5gk", true},
// CID created with --hash sha2-512
{"bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq.6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg.ipfs.dweb.link", "dweb.link", "ipfs", "bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg", true},
{"bafkrgqe3ohjcjplc6n4f3fwunlj6upltg.gn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4f.fyyxnayrtdi5oc4xb2332g645433aeg.ipfs.dweb.link", "dweb.link", "ipfs", "bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg", true},
// other namespaces
{"api.localhost", "", "", "", false},
{"peerid.p2p.localhost", "localhost", "p2p", "peerid", true},
Expand Down
58 changes: 57 additions & 1 deletion test/sharness/t0114-gateway-subdomains.sh
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ test_launch_ipfs_daemon --offline
test_expect_success "Add test text file" '
CID_VAL="hello"
CIDv1=$(echo $CID_VAL | ipfs add --cid-version 1 -Q)
CIDv1_LONG=$(echo $CID_VAL | ipfs add --cid-version 1 --hash sha2-512 -Q)
CID_DNS_SPLIT_CANONICAL="bafkrgqhhyivzstcz3hhswshfjgy6ertgmnqeleynhwt4dl.fsthi4hn7zgh4uvlsb5xncykzapi3ocd4lzogukir6ksdy6wzrnz6ohnv4aglcs"
CID_DNS_SPLIT_CUSTOM="baf.krgqhhyivzstcz3hhswshfjgy6ertgmnqeleynhwt4dl.fsthi4hn7zgh4uvlsb5xncykzapi3ocd4lzogukir.6ksdy6wzrnz6ohnv4aglcs"
CID_DNS_SPLIT_CUSTOM2=$(echo $CIDv1 | sed -r -e "s/^./&./")
CIDv0=$(echo $CID_VAL | ipfs add --cid-version 0 -Q)
CIDv0to1=$(echo "$CIDv0" | ipfs cid base32)
'
Expand Down Expand Up @@ -119,7 +123,6 @@ test_expect_success "Publish test text file to IPNS" '
test_cmp expected2 output
'


# ensure we start with empty Gateway.PublicGateways
test_expect_success 'start daemon with empty config for Gateway.PublicGateways' '
test_kill_ipfs_daemon &&
Expand Down Expand Up @@ -262,6 +265,7 @@ test_expect_success "request for deep path resource at {cid}.ipfs.localhost/sub/
test_should_contain "subdir2-bar" list_response
'


# *.ipns.localhost

# <libp2p-key>.ipns.localhost
Expand Down Expand Up @@ -501,6 +505,58 @@ test_hostname_gateway_response_should_contain \
"http://127.0.0.1:$GWAY_PORT" \
"404 Not Found"

## ============================================================================
## Special handling of CIDs that do not fit in a single DNS Label (>63chars)
## https://github.com/ipfs/go-ipfs/issues/7318
## ============================================================================


# local: *.localhost
test_localhost_gateway_response_should_contain \
"request for a long CID at localhost/ipfs/{CIDv1} returns Location HTTP header for DNS-safe subdomain redirect in browsers" \
"http://localhost:$GWAY_PORT/ipfs/$CIDv1_LONG" \
"Location: http://${CID_DNS_SPLIT_CANONICAL}.ipfs.localhost:$GWAY_PORT/"

test_localhost_gateway_response_should_contain \
"request for {long.CID}.ipfs.localhost should return expected payload" \
"http://${CID_DNS_SPLIT_CANONICAL}.ipfs.localhost:$GWAY_PORT" \
"$CID_VAL"

test_localhost_gateway_response_should_contain \
"request for {custom.split.of.long.CID}.ipfs.localhost should return redirect to a canonical Origin" \
"http://${CID_DNS_SPLIT_CUSTOM}.ipfs.localhost:$GWAY_PORT/ipfs/$CIDv1" \
"Location: http://${CID_DNS_SPLIT_CANONICAL}.ipfs.localhost:$GWAY_PORT/"

test_localhost_gateway_response_should_contain \
"request for {unnecessary.split.of.short.CID}.ipfs.localhost should return redirect to a canonical Origin" \
"http://${CID_DNS_SPLIT_CUSTOM2}.ipfs.localhost:$GWAY_PORT/ipfs/$CIDv1" \
"Location: http://${CIDv1}.ipfs.localhost:$GWAY_PORT/"

# public gateway: *.example.com

test_hostname_gateway_response_should_contain \
"request for a long CID at example.com/ipfs/{CIDv1} returns Location HTTP header for DNS-safe subdomain redirect in browsers" \
"example.com" \
"http://127.0.0.1:$GWAY_PORT/ipfs/$CIDv1_LONG" \
"Location: http://${CID_DNS_SPLIT_CANONICAL}.ipfs.example.com"

test_hostname_gateway_response_should_contain \
"request for {long.CID}.ipfs.example.com should return expected payload" \
"${CID_DNS_SPLIT_CANONICAL}.ipfs.example.com" \
"http://127.0.0.1:$GWAY_PORT" \
"$CID_VAL"

test_hostname_gateway_response_should_contain \
"request for {custom.split.of.long.CID}.ipfs.example.com should return redirect to a canonical Origin" \
"${CID_DNS_SPLIT_CUSTOM}.ipfs.example.com" \
"http://127.0.0.1:$GWAY_PORT" \
"Location: http://${CID_DNS_SPLIT_CANONICAL}.ipfs.example.com"

test_hostname_gateway_response_should_contain \
"request for {unnecessary.split.of.short.CID}.ipfs.example.com should return redirect to a canonical Origin" \
"${CID_DNS_SPLIT_CUSTOM2}.ipfs.example.com" \
"http://127.0.0.1:$GWAY_PORT" \
"Location: http://${CIDv1}.ipfs.example.com"

## ============================================================================
## Test path-based requests with a custom hostname config
Expand Down