Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL #9984

dennis-tra · 2023-06-21T09:57:44Z

Checklist

My issue is specific & actionable.
I am not suggesting a protocol enhancement.
I have searched on the issue tracker for my issue.

Description

Checklist

My issue is specific & actionable.
I am not suggesting a protocol enhancement.
I have searched on the issue tracker for my issue.

Description

This is the corresponding issue for a client side change to mitigate the impact of large numbers of unreachable providers.

Context

The context is explained in #9982.

Proposal

We identified two ways forward to address the impact of unreachable providers.

A prioritization logic of provider records on the server side. The peers that serve provider records sort them in such a way that, e.g., the first one in the list likely contains a peer that is actually reachable.
A delayed provider record publication. E.g. only announce blocks if a peer was online for some time. The assumption is that this will filter out rather short-lived peers.

This GH issue is for proposal 2).

The discussion around 1) happens in #9982

1. Delay Provider Record Publication

The idea here is that we change the default configuration to only reprovide blocks after a Kubo node has had a consecutive uptime of X minutes/hours/days. The assumption is that nodes which have been online for a long time straight, will likely stay online and are stable.

There are some nuances to consider here (copied from probe-lab/network-measurements#49 (comment)):

Today, block reproviding is a global flag in Kubo (IPFS Desktop, Brave): we do not distinguish between blocks fetched while browsing websites (temporarily stored in the cache), and blocks imported by user by adding their own data to local node (either pinned, in MFS, or just in cache). Both types of data are stored and reprovided by the same code paths, and we can't rely on pinning and MFS to identify user data, because ipfs block put and ipfs dag put do not pin by default.

That is to say, disabling reproviding only for third-party content is not trivial: to only stop reproviding third-party website data, we would have to introduce separate datastores with different reproviding settings for first-party and third-party blocks in Kubo.
Content explicitly imported by the user (ipfs add, ipfs dag put, ipfs block put, ipfs dag import), or pinned by user, would be added/moved to first-party datastore.

A different, a bit simpler approach would be to keep a single datastore, but instead introduce a new default "auto" Reprovider.Strategy that:

always announces pinned content (+implicitly pinned MFS) → ensures user content is always reachable asap
announces the remaining blocks in cache (incl. ones that come from browsed websites) ONLY if a node was online for some time (we would add optionalDuration Reprovider.UnpinnedDelay to allow users to adjust the implicit default)
TBD how we solve ipfs dag put and ipfs block put or other user content that is not pinned, but expected to "work instantly:
1. we could flip --pin in them to true → breaking change (may surprise users who expect these to not keep garbage around, may lead to services running out of disk space)
2. we could say that the ability for users to set Reprovider.Strategy to all and/or adjust Reprovider.UnpinnedDelay are enough here, ipfs routing provide exists, we could add --all to allow apps/users to manually trigger provide before Reprovider.UnpinnedDelay hits. (feels safer than A, no DoS, worst case a delay in announce on a cold boot)

A personal remark: It would be great if the user content that is expected to "work instantly" could make use of the fast provide operation. I think these commands are not blocking right now, correct? Using optimistic provide could justify making them blocking. But again, the provide strategy is a global switch. It would be great if the application layer could have more control over the publication process based on its specific needs.

2. Decreased Provider Record TTL

The idea here is to keep everything as is and just transmit the desired provider record TTL. The TTL would be calculated based on the nodes uptime only become a high number if the node has been up for X minutes/hours/days.

At the first glance this is a breaking protocol change but protobuf allows to add new fields without breaking old implementations (see comment from @guillaumemichel probe-lab/network-measurements#49 (comment)). This means we could add the new field and nodes that understand it could adhere to the TTL the provider wants to set. Everyone else would just continue as before.

Some things to consider:

@aschmahmann remarked that we should be careful not opening a DoS vector
The TTL that the client wants to set should have an upper bound that the server enforces. This value should be set to the current TTL.
This strategy increases load on DHT servers because reprovides will happen more frequently. On the other hand, the number of provider records a server holds could decrease because they are garbage collected more frequently.

Measurements

TBD: How can we substantiate the proposal with numbers? some ideas

References

The text was updated successfully, but these errors were encountered:

lidel · 2023-07-24T13:26:48Z

Triage notes:

(1) Delay Provider Record Publication
- Getting UX to the acceptable point will take a lot of work
(2) Decreased Provider Record TTL
- This is both Client and Server change (to add provider-specified TTL)
  - Server-wise, Unreachable Providers (server): Provider-Record Prioritization #9982 may be less invasive
Meta comment: the changes proposed here won't improve things at the release time, it will take 6+ months to see improvement
- we did not do server-side changes so far, so a lot of unknown unknowns
- if we improve things on the client side, preferably, it should be at the time of looking for providers, not publishing providers, that will improve things at release time – lower risk, faster feedback loop, easier to run simulaitons, or back away
  - Kubo maintainers would prefer tackling this here (could be tackled in parallel, but realistically if we have to choose, this would be our bet)
  - So proposals that make provider lookup smarter should take precedence over ones described here (feel free to fill new issue and cc this one for discoverability)

dennis-tra added the kind/feature A new feature label Jun 21, 2023

dennis-tra mentioned this issue Jun 21, 2023

Unreachable Providers (server): Provider-Record Prioritization #9982

Open

3 tasks

dennis-tra changed the title ~~Unreachable Providers (client/provider): Re-Provide Delay/Decreased Record TTL~~ Unreachable Providers (client): Re-Provide Delay/Decreased Record TTL Jun 21, 2023

dennis-tra changed the title ~~Unreachable Providers (client): Re-Provide Delay/Decreased Record TTL~~ Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL Jun 21, 2023

guillaumemichel mentioned this issue Oct 2, 2023

Add TTL to Provider Record probe-lab/zikade#42

Open

lidel added effort/weeks Estimated to take multiple weeks and removed effort/days Estimated to take multiple days, but less than a week labels Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL #9984

Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL #9984

dennis-tra commented Jun 21, 2023 •

edited

Loading

lidel commented Jul 24, 2023

Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL #9984

Unreachable Providers (client): Decreased Re-Provide Delay/Record TTL #9984

Comments

dennis-tra commented Jun 21, 2023 • edited Loading

Checklist

Description

Checklist

Description

Context

Proposal

1. Delay Provider Record Publication

2. Decreased Provider Record TTL

Measurements

References

lidel commented Jul 24, 2023

dennis-tra commented Jun 21, 2023 •

edited

Loading